What DataSHIELD functionality are you working on?

List your username and what DataSHIELD functionality you are working on, to ensure collaboration and not duplication of effort:

  • @becca.wilson integrating GATE text mining with DataSHIELD
  • @PatRyserWelch developing some classification and predictions algorithms. I am interested in exploring how these algorithms can produce some non-disclosed results > See discussion thread
  • @stefan.lenz creating artificial data with generative models, e. g. Deep Boltzmann machines
  • @daniela.zoeller developing multivariable models for high-dimensional data including variable selection, i.e. Distributed Boosting
  • @daniela.zoeller thinking about time-to-event-analysis using pseudo values
  • @paularaissa distributed data imputation
  • @tombishop study level meta analysis of mixed models
  • @bono extending inferential scope within DataSHIELD, using synthetic data generated from disclosable summaries only (https://github.com/bonorico/gcipdr )
  • @demetris.avraam development of a Generalised Linear Mixed Models function; helping a research team from ISGlobal on the development of the dsOmics package for a list of functions for analysis of omics data; supporting PhD student from the Vienna University of Technology on the development of decision trees algorithms in DataSHIELD.
  • @jrgonzalez interested in collaborating with people developing genetic data analysis using PLINK and/or VCF files (GWAS, CNV, mosaicisms, genetic inversions, ā€¦ see https://isglobal-brge.github.io/book_omic_association/). Leading the development of dsOmics package (maybe the name will be changed to dsBioC) that aims to integrate Bioconductor data infrastructures into DataSHIELD to allow omic data analysis (joint collaboration with @demetris.avraam and @yannick).

@bono extending inferential scope within DataSHIELD, using synthetic data generated from disclosable summaries only (https://github.com/bonorico/gcipdr)

It has been a while since I dipped my toe into DataSHIELD development. I know that on the wiki there are some guidelines for development, but Iā€™m not sure if these are up to date. Is there a current best practice on how to proceed, or do people have their own personal solutions?

For example, is it better to work on a single instance to begin with, developing client and server functions in the same R session. Then you would just invoke ā€˜server sideā€™ functions directly from the ā€˜clientā€™ without using datashield.assign etcā€¦ Then when everything is working correctly, split this out into a true client - server setup?

Use DSI combined with DSLite ! :slight_smile:

As a proof of concept I have made a DSI branch in all the datashield client packages (and some server packages as well) and all the testthat tests are now based on DSLite. You can have the client package, the server package and the test data living in the same R session. This is very convenient for development (but still requires testing in a production infrastructure with opal (DSOpal)).

Yannick

This is what I was hoping, but maybe I am being slow and not realising how development would actually work :slight_smile:

Maybe I will just jump straight in and see how it goes. I think the part I am struggling to understand is the workflow. I think it might be like this:

  1. Fork client repo on GitHub, create client project on local machine through RStudio
  2. Repeat for server repo
  3. Open client project, edit/add a new function, build package push to local Git, push to GitHub
  4. Open server project, edit/add a new function, build package push to local Git, push to GitHub
  5. Initiate DSILite, pulling the appropriate packages from my GitHub
  6. Try using it!

I think my concern is that you have to go through steps 3-6 for each small bit of code you write. Am I correct or have I missed the point completely?

The workflow is much simpler, you do not need to push in Github. You need:

  • one Rstudio window with the client package (and the test data)
  • and another Rstudio window with the server package.

Then:

  1. Server window: edit, check + install package locally
  2. Client window: edit, check + install package locally and test using testthat + DSLite
  3. Repeatā€¦

Finally: commit in git when functionality is done.

I think I have got there to get more set up for development.

Here are the steps I have taken (as much for my own records :slight_smile:) :

  1. Install devtools
  2. Create a project for dsBetaTest using Github, having forked the repository and switched my local repo to DSI
  3. Repeat for the client
  4. Install dsBaseClient from Github - DSI branch
  5. Install dsGraphicsClient and other pakages from CRAN (they are required to install dsBetaTestClient)
  6. Repeat for server side packages
  7. Install DSLite from Github.
  8. In FireFox, open dsBetaTestClient project, do some work, save, install
  9. In Chrome, open dsBetaTest project, do some work, save, install (use separate browsers to allow working on 2 projects at once)
  10. In Firefox, set up the DSLite environment, invoke the dsBetaTestClient function that work was done on
  11. See what has happened, go and make changes
  12. Repeatā€¦

A question that has now come up is if it is possible to access objects created in the dsLite environments from the client environment? I know we have the initial dataframes on the client side, but I think it would aid development if you could directly access objects in the other environments. I know with the ā€˜oldā€™ set up, if one had admin rights you could use the opal methods to just return the data. Is an equivalent possible here?

And lastly, while I struggled a bit to begin with, it looks like this will really help with development and is a beautiful solution!

Hi Tom,

Thatā€™s exactly what I was expecting from you: a bug report and a new feature suggestion!

The bug you had with an empty configuration is fixed (in DSI) and I have added a new function in DSLite to retrieve the value(s) from the server side environment(s): getDSLiteData (this is NOT a DataSHIELD function of course), where the first argument is the connection objects returned by datashield.login.

Cheers
Yannick

Hi Yannick,

Iā€™m glad I met expectations!

The new function is exactly what I had in mind. I am trying to use it, but am having trouble:

> DSLite::getDSLiteData(conns,'D')
Error in getDSLiteData(conn, symbol) : attempt to apply non-function

Am I using incorrect syntax?

thanks

Tom

Your DSLiteServer object is outdated (the newly added internal function is not in your instance): you need to recreate it after having updated the DSLite package (and a R session restart would be good also).

Ah yes. Now all OK, thanks!

My work now focuses on the development of a Generalised Linear Mixed Models function. In parallel, Iā€™m helping a research team from ISGlobal on the development of the dsOmics package for a list of functions for analysis of omics data, and a PhD student from the Vienna University of Technology on the development of desicion trees algorithms in DataSHIELD.

Our paper ā€œDeep generative models in DataSHIELDā€ has been published:

If you like, it could be added under the category: ā€œBiostatistics: proof of principle and formal implementationā€ in the list of DataSHIELD publications.

4 Likes

Hi Stefan,

Great news! Weā€™ll definitely add to our publications.

Best wishes, Elaine

1 Like

Dear All, Survival functionality is now available in DataSHIELD in the dsSurvival package.

A preprint describing this is also available named:

dsSurvival: Privacy preserving survival models for federated individual patient meta-analysis in DataSHIELD

Kind regards Soumya