How to send 10 messages (using datashield.aggregate) to 10 servers simultaneously?

Dear all,

I have an operation iteratively performed. In each iteration, I need to send 10 vectors to the corresponding 10 servers for updating and then aggregate the results on the client. The following codes were an example

> sapply(1:10, function(k){
>       W.text=paste0(as.character(W[,k]), collapse=",")
>       cally <- call('function1', W.text, X, Y)
>       func=datashield.aggregate(datasources[k], cally)  
>       return(func)
> })

Here W is the input matrix with 10 columns corresponding to 10 servers. X and Y were the symbols of data on the server. I need to send each column of W to the corresponding server for updating, and then aggregate the results on the client. Such 10 operations were parallel and supposed to be performed simultaneously. However, in my implementation, they were performed sequentially (server by server), which wastes quite a lot of time.

How can I send the messages simultaneously like the figure:

image

Regards, Hank

Hello Hank,

I don’t know if DSI allows for different messages for each server, at least I don’t think so from the source code (DSI/datashield.aggregate.R at master · datashield/DSI · GitHub).

What would solve your problem is parallelizing your server calls using something like Working with promises in R .

Hopefully some DataSHIELD core member knows a simpler way to achieve your needs.

Regards, Xavier.

Hi,

The DSI::datashield.aggregate() assumes that the same argument is passed to each servers. A possible improvement is to pass a named list of arguments, each name corresponding to a server name. That is not much to implement in DSI and the good part of it is that DSI already parallelize the server calls.

I can add this feature for the next DSI release.

Regards
Yannick

Hi Xescriba,

I agreed, I would try foreach to parallelize the functions call.

Regards, Hank

That’s would be great if you can add this feature!

Regards, Hank

I would appreciate this feature on the next DSI release, I am also developing some functions that may benefit from it.

Have a nice day, Xavier.

Feature request issue added.

Hi,

Please give a try to the dev version of DSI (and friends):

remotes::install_github("datashield/DSI")
remotes::install_github("datashield/DSOpal")
remotes::install_github("obiba/opalr")

You can provide a named list of expressions to aggregate. Let’s say for instance you have connections to server1, server2 and server3, the following will make different calls for server1 and server2, and will not call server3:

datashield.aggregate(conns,
                     list(server1 = quote(someFunction(D, 123)), 
                          server2 = quote(someFunction(G, 456)))

Works similarly with datashield.assign.expr(), datashield.assign.table() and datashield.assign.resource().

Let me know how it goes, I would like to make a release next week (the other feature is the polling of the servers, to keep them alive when there are long running tasks).

Best Yannick

Hi Yannick,

Sorry for replying late. I will test it in these two days.

Regards, Hank

Dear Yannick,

I installed these packages according to your instuctions:

remotes::install_github("datashield/DSI")
remotes::install_github("datashield/DSOpal")
remotes::install_github("obiba/opalr")

However, the “Client error” came out until I re-installed opalr from CRAN。 Now the versions of the three packages are opalr (2.0.0), DSI(1.2.0) and DSOpal(1.2.0)

The commands were modified as:

    W=round(W, nDigits)
    callys=lapply(1:nTasks, function(k){ 
      W.text=paste0(as.character(W[,k]), collapse=",")
      cally <- call('function1', W.text, X, Y)
      return(cally)
    })
    names(callys)=names(datasources)
    iter_update=datashield.aggregate(datasources, callys)  

The commands worked as expected. One issue is that the servers’ names have to be assigned to the “callys” before the function call. Is it necessary? I would suggest to get rid of the limitation.

Anyway, thanks for the update. Please integrate the feature into the release version

Regards, Hank

Dear Hank

I did not get the message, sorry for the late response. I have released DSI, DSOpal and opalr in the CRAN and all of them need to updated together.

Yes, you must provide the server name to which the cally will be submited. There is no “order” with the servers, and just providing a vector of cally may result in making requests to the wrong server.

Regards
Yannick