Fill in missing values of variable with values of second variable: ds.replaceNA?

SophieBlaauwendraad · 5 December 2022 14:00

Hi, For my analysis, I’m trying to replace the missing values of a blood pressure variable (“analysis_df_chemicals$bpage.24”) with values of another variable (“analysis_df_chemicals$bpage.25”).

ds.replaceNA(x = “analysis_df_chemicals$sbpav_.24”, forNA = “analysis_df_chemicals$sbpav_.25”, newobj = “sbp2to5”, datasources = connections)

However, this gives the following error: “Command ‘replaceNaDS(analysis_df_chemicals$sbpav_.24, vectorDS(analysis_df_chemicals$sbpav_.25))’ failed on ‘genr’: No such DataSHIELD ‘ASSIGN’ method with name: vectorDS”

Does anyone know how to solve this, or how to combine the values of two numeric variables otherwise (e.g. replace the missing of the first variable by values of the second variable)?

Many thanks! Sophie

demetris.avraam · 6 December 2022 08:26

Hi Sophie,

You need to follow this process:

# replace NAs with zeros
ds.replaceNA(x = "analysis_df_chemicals$sbpav_.24", forNA = '0', newobj = 'var1')
ds.replaceNA(x = "analysis_df_chemicals$sbpav_.25", forNA = '0', newobj = 'var2')

# create a variable 'var2' with ones if analysis_df_chemicals$sbpav_.24 is NA and zeros otherwise
ds.Boole(V1 = "var1", V2 = "0", Boolean.operator = '==', newobj='var3')
ds.table('var3')

# create a variable 'var3' with zeros if analysis_df_chemicals$sbpav_.24 is NA and ones otherwise
ds.Boole(V1 = "var1", V2 = "0", Boolean.operator = '!=', newobj='var4')
ds.table('var4')

# multiply 'var4' with 'var1'; the product will be the value of analysis_df_chemicals$sbpav_.24 
# if it is not missing and zero if analysis_df_chemicals$sbpav_.24 is missing
ds.make(toAssign = "var4*var1", newobj='var5')

# multiply 'var3' with 'var2', the product will be the value of analysis_df_chemicals$sbpav_.25 
# if analysis_df_chemicals$sbpav_.24 is missing and zero otherwise
ds.make(toAssign = "var3*var2", newobj='var6')

# add var4 with var5
ds.make(toAssign = "var5 + var6", newobj='var7')

# if the sum is zero it means that both variables were NA initialy, so replace zeros with NAs
ds.recodeValues(var.name = 'var7', values2replace.vector = '0', new.values.vector = 'NA', newobj='var8')

# do some checks:
ds.summary("analysis_df_chemicals$sbpav_.24")
ds.summary("var8")

ds.numNA("analysis_df_chemicals$sbpav_.24")
ds.numNA("var8")

Note 1: I assume that both variables have positive numbers greater than zero. You can chcek that by doing a ds.summary or ds.histogram of the variables. If they include zeros then we need to sllighlty change the code above.

Note 2: Remember that some functions block names of objects with more than 20 characters, so you might need to rename your input objects.

Let me know if something is not clear or not working.

Thanks, Demetris

SophieBlaauwendraad · 8 December 2022 07:47

Dear Demetris,

Thank you for you code; it is very clear.

Unfortunately, it’s still giving the same error after running the first 2 lines. I have indeed no negative values in my variables, and made sure all objects are less than 20 characters. Is it possible that the ds.replaceNA function is not available in GenR (I have installen the right packages however)? Of could there be another explanation?

#We need to make variables for these age categories in which we take the youngest age in case of two measurements.
ds.histogram("df_chem$sbpav_.24")
ds.histogram("df_chem$sbpav_.25")

# replace NAs with zeros
ds.replaceNA(x = 'df_chem$sbpav_.24', forNA = '0', newobj = 'var1') 
ds.replaceNA(x = 'df_chem$sbpav_.25', forNA = '0', newobj = 'var2') 
datashield.errors()

$genr
[1] "Command 'replaceNaDS(df_chem$sbpav_.25, vectorDS(0))' failed on 'genr': No such DataSHIELD 'ASSIGN' method with name: vectorDS"

Many thanks!

SRingshandl · 8 December 2022 14:15

Hi Sophie.

I’d guess something is wrong wih your installation of dsBase methods. Anyway, here’s a possible answer without ds.replaceNA.

As it’s going a different route and I’m not perfectly sure about the dataframe sorting could somebody from the DataSHIELD team check my hypothesis that NAs will go to the top? Also whether I’m keeping rows intact as they should be.

In the first step you have to select numbers depending on your data. I knew that eos_wert can only be positive numbers and NA. In case you have negative numbers you have to select values that include/exclude all your values.

Please adapt to your dataframe. I called mine D. eos_wert would be your column bpage24 and alterbeiaufnahme would be bpage25.

#1. split dataframe in 2 parts. Available numbers and NAs
ds.dataFrameSubset("D", "D$eos_wert", "0", ">=", newobj = "eos_available") #select every number but no NAs
ds.dataFrameSubset("D", "D$eos_wert", "-1", "<=", keep.NAs = TRUE, newobj = "eos_NAs") #no number goes through that but NAs

#2. merge 2 parts back together. Dataframe is now sorted so that NAs will be at the top
ds.rbind(c("eos_NAs", "eos_available"), newobj = "sorted_D")
ds.dataFrame("sorted_D", newobj = "sorted_D") #why does ds.rbind output a matrix instead of a dataframe?! You need this.

#3. select new column name
ds.assign("eos_NAs$alterbeiaufnahme", newobj = "NAs_replaced") # select a meaningful name here! It will be your new column name!

#4. merge numbers that are available and numbers from other column. Same order as above!
ds.rbind(c("NAs_replaced", "eos_available$eos_wert"), newobj = "replaced")

#5. combine with sorted dataframe from step 2
ds.cbind(c("sorted_D", "replaced"), newobj = "replaced_D") # You could overwrite your original dataframe if you want to

#6. check your new column at the end. It has the name from step 3!
ds.colnames("replaced_D")

Best Stephan

demetris.avraam · 8 December 2022 19:20

Hi Sophie,

Can you please check what version of dsBaseClient on the client-side you are using, you can check that with the command sessionInfo(). Since you are using GenR, the version of the server-side dsBase is 6.1, you can confirm this with the datashield.pkg_status() command.

@SRingshandl yes through the steps you proposed, after the ds.rbind the NAs will appear at the top, however you need to use the ds.rbind with caution as it converts character variables to numerics and therefore might cause problems in other steps of data processing/analysis.

Best, Demetris

SophieBlaauwendraad · 9 December 2022 10:25

Hi Demetris,

Indeed, I’m using 6.2.0 according to sessionInfo(). And indeed GenR has 6.1. Do I understand correctly that this might be the reason nareplace is not working?

Thanks! Sophie

demetris.avraam · 9 December 2022 10:40

Hi Sophie,

yes this is the reason. Are you using the LifeCycle (molgenis) central analysis server, right? if yes, then you don’t need to install the dsBaseClient by yourself because the package is there and in the same version as the servers have their dsBase package.

So to solve this: go to the list of packages and delete the dsBaseClient 6.2, then stop and restart your analysis server and then load the package by library(dsBaseClient) and check again by sessionInfo().

SophieBlaauwendraad · 9 December 2022 11:08

Hi Demetris! I did as you said, unfortunately, the version is still 6.2.0 on the server side. My code is as follows. Is there anything wrong in my code?

install.packages('DSI')
install.packages('DSOpal', dependencies=TRUE)
install.packages('DSMolgenisArmadillo')

#load libraries
library(DSI) #to login and logout
library(DSOpal) #to access the Opal server 
library(dsBaseClient)
library(DSMolgenisArmadillo)

#Check versions of packages; dsBase(Clients needs to be the same as the package version on the serverside)
sessionInfo()

SophieBlaauwendraad · 9 December 2022 11:09

Sorry i meant on the client side

demetris.avraam · 9 December 2022 11:22

Your code is correct but you don’t need to install DSI, DSOpal, DSMolgenisArmadillo because those are also installed on the analysis server.

Try to logout from the server and also restart the R session.

SophieBlaauwendraad · 12 December 2022 08:39

Hi Demetris,

Unfortunately this has not worked (tried it several times). The version of dsBaseClient is still 6.2.0 on the client side. Maybe I can ask to install 6.2 as well on the server side?

Best Sophie

SophieBlaauwendraad · 12 December 2022 09:18

Thank you Stephan, this seems to work! Just curious on what problems might occur in further steps of the analysis (as Demetris mentioned)

Topic		Replies	Views
Recoding NAs in a factor variable Analyst Support	1	603	31 August 2020
Recode levels to missing Beginner Support	7	411	6 October 2021
Creating dummy variables Analyst Support	5	275	19 May 2021
NAs get switched for first category when working with multiple connections that don't have NAs Analyst Support	2	197	9 February 2023
WARNING: useNA='ifany' argument in ds.table function might create incorrect results Analyst Support	0	489	30 September 2021

Fill in missing values of variable with values of second variable: ds.replaceNA?

Related topics