An error when joining tables using ds.merge

Hello,

while running the example code for ds.merge from your documentation, I encounter the following error:

datashield.errors() $study1 [1] “Command ‘mergeDS("df.x", "df.y", "D$LAB_TSC", "D$LAB_TSC", TRUE, TRUE, \n TRUE, ".x,.y", TRUE, NULL)’ failed on ‘study1’: Error while evaluating ‘is.null(base::assign(‘df.merge’, value={dsBase::mergeDS("df.x", "df.y", "D$LAB_TSC", "D$LAB_TSC", TRUE, TRUE, TRUE, ".x,.y", TRUE, NULL)}))’ → Error in fix.by(by.x, x) : ‘by’ must specify a uniquely valid column\n”

$study2 …

$study3 …

I tried applying ds.merge on my own test data but I end up with the same error. The tables and variables to be joined seem to be correctly defined, so I am having trouble understanding the source of the problem.

My code (very similar to your example):

builder <- DSI::newDSLoginBuilder()
builder$append(server = "study1", 
               url = "https://opal-demo.obiba.org", 
               user = "administrator", password = "password", 
               table = "TEST.patient", driver = "OpalDriver")
logindata <- builder$build()
connections <- DSI::datashield.login(logins = logindata, assign = TRUE, symbol = "patient") 

ds.dataFrame(x = c("patient$patient_id", "patient$country"),
             completeCases = TRUE,
             newobj = "df.x",
             datasources = connections)
ds.dataFrame(x = c("patient$patient_id", "patient$age"),
             completeCases = TRUE,
             newobj = "df.y",
             datasources = connections) 

# Merge data frames using the common variable "patient_id"
ds.merge(x.name = "df.x",
         y.name = "df.y",
         by.x.names = "patient$patient_id",
         by.y.names = "patient$patient_id",
         all.x = TRUE,
         all.y = TRUE,
         sort = TRUE,
         suffixes = c(".x", ".y"),
         no.dups = TRUE,
         newobj = "df.merge",
         datasources = connections)

Can you verify that the example code from the documentation works?

Thank you.

Hi Tanja,

In version 6.0, I have updated the ds.dataFrame function in order to remove any “D$” parts from the column names of the generated dataframes. So, the correct way to do the merging is to define the arguments “by.x.names” and “by.y.names” without the “D$” part (in your example you have to remove the “patient$” part). Thanks for spotting this issue, I will update the example in the documentation of ds.merge.

1 Like