What DataSHIELD functionality are you working on?

becca.wilson · 8 January 2019 18:35

List your username and what DataSHIELD functionality you are working on, to ensure collaboration and not duplication of effort:

@becca.wilson integrating GATE text mining with DataSHIELD
@PatRyserWelch developing some classification and predictions algorithms. I am interested in exploring how these algorithms can produce some non-disclosed results > See discussion thread
@stefan.lenz creating artificial data with generative models, e. g. Deep Boltzmann machines
@daniela.zoeller developing multivariable models for high-dimensional data including variable selection, i.e. Distributed Boosting
@daniela.zoeller thinking about time-to-event-analysis using pseudo values
@paularaissa distributed data imputation
@tombishop study level meta analysis of mixed models
@bono extending inferential scope within DataSHIELD, using synthetic data generated from disclosable summaries only (https://github.com/bonorico/gcipdr )
@demetris.avraam development of a Generalised Linear Mixed Models function; helping a research team from ISGlobal on the development of the dsOmics package for a list of functions for analysis of omics data; supporting PhD student from the Vienna University of Technology on the development of decision trees algorithms in DataSHIELD.
@jrgonzalez interested in collaborating with people developing genetic data analysis using PLINK and/or VCF files (GWAS, CNV, mosaicisms, genetic inversions, … see https://isglobal-brge.github.io/book_omic_association/). Leading the development of dsOmics package (maybe the name will be changed to dsBioC) that aims to integrate Bioconductor data infrastructures into DataSHIELD to allow omic data analysis (joint collaboration with @demetris.avraam and @yannick).

bono · 27 March 2019 11:14

@bono extending inferential scope within DataSHIELD, using synthetic data generated from disclosable summaries only (https://github.com/bonorico/gcipdr)

tombishop · 4 April 2019 13:26

It has been a while since I dipped my toe into DataSHIELD development. I know that on the wiki there are some guidelines for development, but I’m not sure if these are up to date. Is there a current best practice on how to proceed, or do people have their own personal solutions?

For example, is it better to work on a single instance to begin with, developing client and server functions in the same R session. Then you would just invoke ‘server side’ functions directly from the ‘client’ without using datashield.assign etc… Then when everything is working correctly, split this out into a true client - server setup?

yannick · 4 April 2019 13:53

Use DSI combined with DSLite !

As a proof of concept I have made a DSI branch in all the datashield client packages (and some server packages as well) and all the testthat tests are now based on DSLite. You can have the client package, the server package and the test data living in the same R session. This is very convenient for development (but still requires testing in a production infrastructure with opal (DSOpal)).

Yannick

tombishop · 4 April 2019 14:13

This is what I was hoping, but maybe I am being slow and not realising how development would actually work

Maybe I will just jump straight in and see how it goes. I think the part I am struggling to understand is the workflow. I think it might be like this:

Fork client repo on GitHub, create client project on local machine through RStudio
Repeat for server repo
Open client project, edit/add a new function, build package push to local Git, push to GitHub
Open server project, edit/add a new function, build package push to local Git, push to GitHub
Initiate DSILite, pulling the appropriate packages from my GitHub
Try using it!

I think my concern is that you have to go through steps 3-6 for each small bit of code you write. Am I correct or have I missed the point completely?

yannick · 4 April 2019 15:23

The workflow is much simpler, you do not need to push in Github. You need:

one Rstudio window with the client package (and the test data)
and another Rstudio window with the server package.

Then:

Server window: edit, check + install package locally
Client window: edit, check + install package locally and test using testthat + DSLite
Repeat…

Finally: commit in git when functionality is done.

tombishop · 5 April 2019 20:14

I think I have got there to get more set up for development.

Here are the steps I have taken (as much for my own records ) :

Install devtools
Create a project for dsBetaTest using Github, having forked the repository and switched my local repo to DSI
Repeat for the client
Install dsBaseClient from Github - DSI branch
Install dsGraphicsClient and other pakages from CRAN (they are required to install dsBetaTestClient)
Repeat for server side packages
Install DSLite from Github.
In FireFox, open dsBetaTestClient project, do some work, save, install
In Chrome, open dsBetaTest project, do some work, save, install (use separate browsers to allow working on 2 projects at once)
In Firefox, set up the DSLite environment, invoke the dsBetaTestClient function that work was done on
See what has happened, go and make changes
Repeat…

A question that has now come up is if it is possible to access objects created in the dsLite environments from the client environment? I know we have the initial dataframes on the client side, but I think it would aid development if you could directly access objects in the other environments. I know with the ‘old’ set up, if one had admin rights you could use the opal methods to just return the data. Is an equivalent possible here?

And lastly, while I struggled a bit to begin with, it looks like this will really help with development and is a beautiful solution!

yannick · 8 April 2019 09:22

Hi Tom,

That’s exactly what I was expecting from you: a bug report and a new feature suggestion!

The bug you had with an empty configuration is fixed (in DSI) and I have added a new function in DSLite to retrieve the value(s) from the server side environment(s): getDSLiteData (this is NOT a DataSHIELD function of course), where the first argument is the connection objects returned by datashield.login.

Cheers
Yannick

tombishop · 8 April 2019 10:39

Hi Yannick,

I’m glad I met expectations!

The new function is exactly what I had in mind. I am trying to use it, but am having trouble:

> DSLite::getDSLiteData(conns,'D')
Error in getDSLiteData(conn, symbol) : attempt to apply non-function

Am I using incorrect syntax?

thanks

Tom

yannick · 8 April 2019 11:58

Your DSLiteServer object is outdated (the newly added internal function is not in your instance): you need to recreate it after having updated the DSLite package (and a R session restart would be good also).

tombishop · 8 April 2019 12:04

Ah yes. Now all OK, thanks!

demetris.avraam · 11 July 2019 06:59

My work now focuses on the development of a Generalised Linear Mixed Models function. In parallel, I’m helping a research team from ISGlobal on the development of the dsOmics package for a list of functions for analysis of omics data, and a PhD student from the Vienna University of Technology on the development of desicion trees algorithms in DataSHIELD.

stefan.lenz · 6 April 2021 16:05

Our paper “Deep generative models in DataSHIELD” has been published:

If you like, it could be added under the category: “Biostatistics: proof of principle and formal implementation” in the list of DataSHIELD publications.

Elaine.Smith · 6 April 2021 16:27

Hi Stefan,

Great news! We’ll definitely add to our publications.

Best wishes, Elaine

neelsoumya · 29 May 2022 05:10

Dear All, Survival functionality is now available in DataSHIELD in the dsSurvival package.

A preprint describing this is also available named:

dsSurvival: Privacy preserving survival models for federated individual patient meta-analysis in DataSHIELD

Kind regards Soumya

Topic		Replies	Views
DataSHIELD teleconference - DSI/Resources/dsOmics New functionality	7	731	2 March 2020
Statement: DataSHIELD disclosure controls and mitigation Old news statement , announcement , news	4	1167	8 November 2022
DS Governance / Funding opportunity (CZI) New functionality	11	587	18 December 2019
Machine learning or deep learning possibilities with DataSHIELD Beginner Support	18	727	21 April 2021
Sharing user experience with DataSHIELD in our project Old news	13	624	4 February 2020

What DataSHIELD functionality are you working on?

Related topics