Fill in missing values of variable with values of second variable: ds.replaceNA?

Hi, For my analysis, I’m trying to replace the missing values of a blood pressure variable (“analysis_df_chemicals$bpage.24”) with values of another variable (“analysis_df_chemicals$bpage.25”).

ds.replaceNA(x = “analysis_df_chemicals$sbpav_.24”, forNA = “analysis_df_chemicals$sbpav_.25”, newobj = “sbp2to5”, datasources = connections)

However, this gives the following error: “Command ‘replaceNaDS(analysis_df_chemicals$sbpav_.24, vectorDS(analysis_df_chemicals$sbpav_.25))’ failed on ‘genr’: No such DataSHIELD ‘ASSIGN’ method with name: vectorDS”

Does anyone know how to solve this, or how to combine the values of two numeric variables otherwise (e.g. replace the missing of the first variable by values of the second variable)?

Many thanks! Sophie

Hi Sophie,

You need to follow this process:

# replace NAs with zeros
ds.replaceNA(x = "analysis_df_chemicals$sbpav_.24", forNA = '0', newobj = 'var1')
ds.replaceNA(x = "analysis_df_chemicals$sbpav_.25", forNA = '0', newobj = 'var2')

# create a variable 'var2' with ones if analysis_df_chemicals$sbpav_.24 is NA and zeros otherwise
ds.Boole(V1 = "var1", V2 = "0", Boolean.operator = '==', newobj='var3')

# create a variable 'var3' with zeros if analysis_df_chemicals$sbpav_.24 is NA and ones otherwise
ds.Boole(V1 = "var1", V2 = "0", Boolean.operator = '!=', newobj='var4')

# multiply 'var4' with 'var1'; the product will be the value of analysis_df_chemicals$sbpav_.24 
# if it is not missing and zero if analysis_df_chemicals$sbpav_.24 is missing
ds.make(toAssign = "var4*var1", newobj='var5')

# multiply 'var3' with 'var2', the product will be the value of analysis_df_chemicals$sbpav_.25 
# if analysis_df_chemicals$sbpav_.24 is missing and zero otherwise
ds.make(toAssign = "var3*var2", newobj='var6')

# add var4 with var5
ds.make(toAssign = "var5 + var6", newobj='var7')

# if the sum is zero it means that both variables were NA initialy, so replace zeros with NAs
ds.recodeValues( = 'var7', values2replace.vector = '0', new.values.vector = 'NA', newobj='var8')

# do some checks:


Note 1: I assume that both variables have positive numbers greater than zero. You can chcek that by doing a ds.summary or ds.histogram of the variables. If they include zeros then we need to sllighlty change the code above.

Note 2: Remember that some functions block names of objects with more than 20 characters, so you might need to rename your input objects.

Let me know if something is not clear or not working.

Thanks, Demetris

Dear Demetris,

Thank you for you code; it is very clear.

Unfortunately, it’s still giving the same error after running the first 2 lines. I have indeed no negative values in my variables, and made sure all objects are less than 20 characters. Is it possible that the ds.replaceNA function is not available in GenR (I have installen the right packages however)? Of could there be another explanation?

#We need to make variables for these age categories in which we take the youngest age in case of two measurements.

# replace NAs with zeros
ds.replaceNA(x = 'df_chem$sbpav_.24', forNA = '0', newobj = 'var1') 
ds.replaceNA(x = 'df_chem$sbpav_.25', forNA = '0', newobj = 'var2') 
[1] "Command 'replaceNaDS(df_chem$sbpav_.25, vectorDS(0))' failed on 'genr': No such DataSHIELD 'ASSIGN' method with name: vectorDS"

Many thanks!

Hi Sophie.

I’d guess something is wrong wih your installation of dsBase methods. Anyway, here’s a possible answer without ds.replaceNA.

As it’s going a different route and I’m not perfectly sure about the dataframe sorting could somebody from the DataSHIELD team check my hypothesis that NAs will go to the top? Also whether I’m keeping rows intact as they should be.

In the first step you have to select numbers depending on your data. I knew that eos_wert can only be positive numbers and NA. In case you have negative numbers you have to select values that include/exclude all your values.

Please adapt to your dataframe. I called mine D. eos_wert would be your column bpage24 and alterbeiaufnahme would be bpage25.

#1. split dataframe in 2 parts. Available numbers and NAs
ds.dataFrameSubset("D", "D$eos_wert", "0", ">=", newobj = "eos_available") #select every number but no NAs
ds.dataFrameSubset("D", "D$eos_wert", "-1", "<=", keep.NAs = TRUE, newobj = "eos_NAs") #no number goes through that but NAs

#2. merge 2 parts back together. Dataframe is now sorted so that NAs will be at the top
ds.rbind(c("eos_NAs", "eos_available"), newobj = "sorted_D")
ds.dataFrame("sorted_D", newobj = "sorted_D") #why does ds.rbind output a matrix instead of a dataframe?! You need this.

#3. select new column name
ds.assign("eos_NAs$alterbeiaufnahme", newobj = "NAs_replaced") # select a meaningful name here! It will be your new column name!

#4. merge numbers that are available and numbers from other column. Same order as above!
ds.rbind(c("NAs_replaced", "eos_available$eos_wert"), newobj = "replaced")

#5. combine with sorted dataframe from step 2
ds.cbind(c("sorted_D", "replaced"), newobj = "replaced_D") # You could overwrite your original dataframe if you want to

#6. check your new column at the end. It has the name from step 3!

Best Stephan

Hi Sophie,

Can you please check what version of dsBaseClient on the client-side you are using, you can check that with the command sessionInfo(). Since you are using GenR, the version of the server-side dsBase is 6.1, you can confirm this with the datashield.pkg_status() command.

@SRingshandl yes through the steps you proposed, after the ds.rbind the NAs will appear at the top, however you need to use the ds.rbind with caution as it converts character variables to numerics and therefore might cause problems in other steps of data processing/analysis.

Best, Demetris

Hi Demetris,

Indeed, I’m using 6.2.0 according to sessionInfo(). And indeed GenR has 6.1. Do I understand correctly that this might be the reason nareplace is not working?

Thanks! Sophie

Hi Sophie,

yes this is the reason. Are you using the LifeCycle (molgenis) central analysis server, right? if yes, then you don’t need to install the dsBaseClient by yourself because the package is there and in the same version as the servers have their dsBase package.

So to solve this: go to the list of packages and delete the dsBaseClient 6.2, then stop and restart your analysis server and then load the package by library(dsBaseClient) and check again by sessionInfo().

Hi Demetris! I did as you said, unfortunately, the version is still 6.2.0 on the server side. My code is as follows. Is there anything wrong in my code?

install.packages('DSOpal', dependencies=TRUE)

#load libraries
library(DSI) #to login and logout
library(DSOpal) #to access the Opal server 

#Check versions of packages; dsBase(Clients needs to be the same as the package version on the serverside)

Sorry i meant on the client side

Your code is correct but you don’t need to install DSI, DSOpal, DSMolgenisArmadillo because those are also installed on the analysis server.

Try to logout from the server and also restart the R session.

Hi Demetris,

Unfortunately this has not worked (tried it several times). The version of dsBaseClient is still 6.2.0 on the client side. Maybe I can ask to install 6.2 as well on the server side?

Best Sophie

Thank you Stephan, this seems to work! Just curious on what problems might occur in further steps of the analysis (as Demetris mentioned)