List your username and what DataSHIELD functionality you are working on, to ensure collaboration and not duplication of effort:
@becca.wilson integrating GATE text mining with DataSHIELD
@PatRyserWelch developing some classification and predictions algorithms. I am interested in exploring how these algorithms can produce some non-disclosed results > See discussion thread
@stefan.lenz creating artificial data with generative models, e. g. Deep Boltzmann machines
@daniela.zoeller developing multivariable models for high-dimensional data including variable selection, i.e. Distributed Boosting
@daniela.zoeller thinking about time-to-event-analysis using pseudo values
@demetris.avraam development of a Generalised Linear Mixed Models function; helping a research team from ISGlobal on the development of the dsOmics package for a list of functions for analysis of omics data; supporting PhD student from the Vienna University of Technology on the development of decision trees algorithms in DataSHIELD.
@jrgonzalez interested in collaborating with people developing genetic data analysis using PLINK and/or VCF files (GWAS, CNV, mosaicisms, genetic inversions, ⦠see https://isglobal-brge.github.io/book_omic_association/). Leading the development of dsOmics package (maybe the name will be changed to dsBioC) that aims to integrate Bioconductor data infrastructures into DataSHIELD to allow omic data analysis (joint collaboration with @demetris.avraam and @yannick).
It has been a while since I dipped my toe into DataSHIELD development. I know that on the wiki there are some guidelines for development, but Iām not sure if these are up to date. Is there a current best practice on how to proceed, or do people have their own personal solutions?
For example, is it better to work on a single instance to begin with, developing client and server functions in the same R session. Then you would just invoke āserver sideā functions directly from the āclientā without using datashield.assign etc⦠Then when everything is working correctly, split this out into a true client - server setup?
As a proof of concept I have made a DSI branch in all the datashield client packages (and some server packages as well) and all the testthat tests are now based on DSLite. You can have the client package, the server package and the test data living in the same R session. This is very convenient for development (but still requires testing in a production infrastructure with opal (DSOpal)).
I think I have got there to get more set up for development.
Here are the steps I have taken (as much for my own records ) :
Install devtools
Create a project for dsBetaTest using Github, having forked the repository and switched my local repo to DSI
Repeat for the client
Install dsBaseClient from Github - DSI branch
Install dsGraphicsClient and other pakages from CRAN (they are required to install dsBetaTestClient)
Repeat for server side packages
Install DSLite from Github.
In FireFox, open dsBetaTestClient project, do some work, save, install
In Chrome, open dsBetaTest project, do some work, save, install (use separate browsers to allow working on 2 projects at once)
In Firefox, set up the DSLite environment, invoke the dsBetaTestClient function that work was done on
See what has happened, go and make changes
Repeatā¦
A question that has now come up is if it is possible to access objects created in the dsLite environments from the client environment? I know we have the initial dataframes on the client side, but I think it would aid development if you could directly access objects in the other environments. I know with the āoldā set up, if one had admin rights you could use the opal methods to just return the data. Is an equivalent possible here?
And lastly, while I struggled a bit to begin with, it looks like this will really help with development and is a beautiful solution!
Thatās exactly what I was expecting from you: a bug report and a new feature suggestion!
The bug you had with an empty configuration is fixed (in DSI) and I have added a new function in DSLite to retrieve the value(s) from the server side environment(s): getDSLiteData (this is NOT a DataSHIELD function of course), where the first argument is the connection objects returned by datashield.login.
Your DSLiteServer object is outdated (the newly added internal function is not in your instance): you need to recreate it after having updated the DSLite package (and a R session restart would be good also).
My work now focuses on the development of a Generalised Linear Mixed Models function. In parallel, Iām helping a research team from ISGlobal on the development of the dsOmics package for a list of functions for analysis of omics data, and a PhD student from the Vienna University of Technology on the development of desicion trees algorithms in DataSHIELD.
Our paper āDeep generative models in DataSHIELDā has been published:
If you like, it could be added under the category: āBiostatistics: proof of principle and formal implementationā in the list of DataSHIELD publications.