Function "ds.dataFrameSubset" breaks when using DSLite

Dear community,

I recently started to work with Datashield and ran into trouble using the ds.dataFrameSubset function within the DSLite framework.

Problem: For me, the function ds.dataFrameSubset returns the error “cannot coerce class ‘“function”’ to a data.frame” on all servers, if I use DSLite. However, when I use the servers from the virtual machines, as in the tutorials, the function works perfectly fine.

Reproduction of not working code

library(DSLite)
library(dsBaseClient)

logindata.dslite.cnsim <- setupCNSIMTest()
connections <- datashield.login(logindata.dslite.cnsim, assign=T)
ds.dataFrameSubset(df.name = "D", V1.name = "D$GENDER", V2.name = "1",
                   Boolean.operator = "==", newobj = "CNSIM.subset.Males",
                   datasources= connections)

Reproduction of working code (from the tutorial)

library(DSI)
library(DSOpal)
library(dsBaseClient)


builder <- DSI::newDSLoginBuilder()
builder$append(server = "server1",  url = "http://192.168.56.100:8080/",
               user = "administrator", password = "datashield_test&", driver = "OpalDriver")
builder$append(server = "server2", url = "http://192.168.56.101:8080/",
               user = "administrator", password = "datashield_test&", driver = "OpalDriver")

logindata <- builder$build()

connections <- DSI::datashield.login(logins = logindata, assign = TRUE, symbol = "D")

DSI::datashield.assign.table(conns = connections, symbol = "D", table = c("CNSIM.CNSIM1","CNSIM.CNSIM2", "CNSIM.CNSIM3"))
ds.dataFrameSubset(df.name = "D", V1.name = "D$GENDER", V2.name = "1",
                   Boolean.operator = "==", newobj = "CNSIM.subset.Males",
                   datasources= connections)

Has someone experienced similar problems or has an idea what is going on? (It could definitely be a beginners mistake! :slight_smile:)

Hia manuhuth, welcome to the forums!

Was there an error generated from the code that didn’t work? And if so, could you post it for us to read?

Thanks, Alex

Hi Alex,

thanks for your reply! Sorry, the error message was a bit hidden in my initial post, I should have made it clearer. :slight_smile: The generated error on all servers was: "cannot coerce class ‘function’ to a data.frame

I have kept on working on the problem and found that the function call using ds.dataFrameSubset(df.name=“D”, …) could not find the object D (which is the name of the data frame symbol; see initial post) on the server side. Since D() is a pre-loaded function in the R environment, the server then used this D() function instead of the data frame and tried to make a data frame out of the function, explaining the error message.

If I change the symbol name in my whole script to something else, e.g. “D_test”, the error message from all server sides is that the object “D_test” cannot be found. However, when I run datashield.symbols(connections), the output shows that “D_test” is defined on all servers. Furthermore, all of the subsequent calls work (they work as well when the symbol is “D” instead of “D_test”):

ds.dim(x = 'D_test') 
ds.colnames(x='D_test', datasources = connections)
ds.class(x='D_test$LAB_HDL', datasources = connections) 
ds.quantileMean(x='D_test$LAB_HDL', datasources = connections)
ds.mean(x='D_test$LAB_HDL', datasources = connections)

Trying to trace out the problem further, I found that the server side function that causes the error is “dataFrameSubsetDS1(…)”. The error occurs in line 158, when the function evaluates the string to build the data frame on the server side, which is then not found.

df2subset <- eval(parse(text=df.name.2)) #In my example df.name.2 is: paste0("data.frame(",D_test,")")

I was wondering why the other client side functions are able to find the object “D” or “D_test” but “ds.dataFrameSubset” is not, and why this issue only occurs using DSLite.

Would you have an idea what could cause the described behavior? :slight_smile:

Thanks, Manu

I can reproduce it:

library(DSLite)
library(dsBaseClient)

logindata.dslite.cnsim <- setupCNSIMTest()

connections <- datashield.login(logindata.dslite.cnsim, assign=T, symbol = "D")
ds.dataFrameSubset(df.name = "D", V1.name = "D$GENDER", V2.name = "1",
                   Boolean.operator = "==", newobj = "CNSIM.subset.Males",
                   datasources= connections)
datashield.errors()
datashield.logout(connections)

The output is:

> ds.dataFrameSubset(df.name = "D", V1.name = "D$GENDER", V2.name = "1",
+                    Boolean.operator = "==", newobj = "CNSIM.subset.Males",
+                    datasources= connections)
  Aggregated (dataFrameSubsetDS1("D", "D$GENDER", "1", 1, NULL, NULL, FALSE)) [==========] 100% / 0s
Error: There are some DataSHIELD errors, list them with datashield.errors()
> datashield.errors()
$sim1
[1] "cannot coerce class ‘\"function\"’ to a data.frame"

$sim2
[1] "cannot coerce class ‘\"function\"’ to a data.frame"

$sim3
[1] "cannot coerce class ‘\"function\"’ to a data.frame"

Yannick

Thanks to both of you! I have just written a post where I described my progress with the problem in more detail but it was deleted by the forum’s automatic spam filter. In short what I wrote:

The basic problem is that the function the function dataFrameSubsetDS1() does not find the object D on the server side. But there is a build-in R function D() which is then used instead causing the error here in line 158, when calling:

df.name.2 <- paste0("data.frame(",df.name,")")
df2subset <- eval(parse(text=df.name.2)) 

However, datashield.symbols(connections) shows that “D” exists on the servers and other client side functions like ds.dim(x = 'D') work.

Thanks, Manu

It is because of the usage of the eval() in the dataFrameSubsetDS1() function which is not compliant with DSLite: envir = parent.frame() parameter is missing.

The same goes with dataFrameSubsetDS2().

Thank you so much Yannick! It works, when I add envir = parent.frame() to the respective lines of eval(…) in dataFrameSubsetDS1() and dataFrameSubsetDS2() and install it locally. :clap:

When executed in a R server, the parent frame is the R session. In DSLite it is an environment that encapsulates the “server-side” operations. Specifying envir = parent.frame() makes the code more generic.

DSLite should be part of the test suite.

Here is a related issue:

We should try to do away with using “D” as the default name because it is also a function name. This can result in cryptic error messages!!

@manuhuth FYI this issue was fixed a while ago and will be available in the next dsBase version (6.2) which will be released soon.

2 Likes

Huh, I’m sorry this happened to you Manu, I’ve checked the admin’s inbox but unfortunately doesn’t look like there’s a way to reinstate it :confused:. Anyway, I see Yannick has given a good response, hope all goes well from here!

1 Like

There is a development task of integration the new data set into DSLite which remains to be done. Once this is complete dsBaseClient can be tested with DSLite. This will allow us to find the remaining incompatibilities with DSLite.

Stuart