Hi,
I am pleased to announce that DataSHIELD has a new backend: DSLite. Opal is still the only data repository that supports DataSHIELD,
DSLite
is a serverless (i.e. a pure software solution) implementation of DSI
: the DataSHIELD server-side operations happen in distinct R environments in the same R session as the DataSHIELD client. The datasets that are analyzed are living on the client side. DSLite also supports workspace save/restore. The function call filtering is less strict than the one of Opal but that’s not a security issue as the individual level data are accessible anyway.
See DSLite README for an explanation of the architecture.
The benefits of this:
- super-fast and lightweight new DS functions development cycle as VMs and data upload are not needed anymore, all can happen on the developer’s workstation.
- allow combined analysis between remotely accessible datasets in secure data repository (Opal) and local datasets that cannot be shared.
This also proves the robustness of the DSI as only minor adjustments were needed to support both Opal and DSLite as DataSHIELD backends.
To give it a try:
# install required packages
install.packages("dsBase", repos="https://cran.obiba.org", dependencies=TRUE)
# install development packages
devtools::install_github("datashield/DSI")
devtools::install_github("datashield/dsBaseClient", ref = "DSI")
devtools::install_github("datashield/DSLite")
# example with dsBase
library(dsBaseClient)
# explicit load is now required
library(DSLite)
# prepare data in a light DS server
data("CNSIM1")
data("CNSIM2")
data("CNSIM3")
dslite.server <- newDSLiteServer(tables=list(CNSIM1=CNSIM1, CNSIM2=CNSIM2, CNSIM3=CNSIM3))
# datashield logins and assignments
data("logindata.dslite.demo")
conns <- datashield.login(logindata.dslite.demo, assign=TRUE, variables=c("GENDER","PM_BMI_CONTINUOUS"))
ds.summary(x='D$PM_BMI_CONTINUOUS')
ds.ls()
datashield.logout(conns)
You can also perform mixed analysis on local and distant Opal demo datasets:
# install required packages
install.packages(c("opalr", "dsBase"), repos=c("https://cran.r-project.org", "https://cran.obiba.org"), dependencies=TRUE)
# install development packages
devtools::install_github("datashield/DSI")
devtools::install_github("datashield/dsBaseClient", ref = "DSI")
devtools::install_github("datashield/DSOpal")
devtools::install_github("datashield/DSLite")
# example with dsBase
library(dsBaseClient)
# explicit load is now required
library(DSOpal)
library(DSLite)
# prepare data in a light DS server
data("CNSIM2")
dslite.server <- newDSLiteServer(tables=list(CNSIM2=CNSIM2))
# prepare login data
server <- c("study1", "study2", "study3")
url <- c("https://opal-demo.obiba.org", "dslite.server", "https://opal-demo.obiba.org")
user <- c("administrator", "", "administrator")
password <- c("password", "", "password")
table <- c("datashield.CNSIM1", "CNSIM2", "datashield.CNSIM3")
options <- rep("", 3)
driver <- c("OpalDriver", "DSLiteDriver", "OpalDriver")
logindata.mixed.demo <- data.frame(server,url,user,password,table,options,driver)
conns <- datashield.login(logindata.mixed.demo, assign = TRUE)
ds.summary(x='D$LAB_TSC')
ds.mean(x='D$PM_BMI_CONTINUOUS')
ds.ls()
datashield.logout(conns)
Cheers,
Yannick