Within LifeCycle we have made an inventory what is needed to do actual the analysis. The list is prioritised by what is most needed. The list will evolve and be amended over time when LifeCycle members become more familiar with R and DataSHIELD.
The datamanagement tools are really needed to do the first analysis and will be preferable in the September release. Regarding the analysis tools the LifeCycle WP leaders need to make a decision about which packages need to be developed first regarding analysis. The intention it to determine this in the next WP leaders meeting somewhere half of June.
I'd like to recall that, as shown in the DataSHIELD workshop in
Newcastle last November, a simulation-based approach to
approximate GLMM in federated fashion, as compared to the
distributed regression approach of Elinor for instance, is also
already available .
Also see example
(line 560) from submitted documentation now under revision.
The software could be made more DataSHIELD user-friendly, but it
already works once loaded as external package on the
central-analysis computer.
Please consider if referencing such external functionalities in
the next release can be useful (I can write some documentation for
it). Note this simulation-based approach can also be readily used
for frailty Cox regression and Nelson-Aalen estimation in
DataSHIELD (examples to come).
Before I do any more work on the ‘Tom Bishop GLMM’ (i.e. study level meta analysis of separate GLMMs), I would like to understand a bit more. I have looked at the GitHub links, but was wondering if there was a paper I could look at to get a bit more context.
You mean a proposal to do actual research with GLMM I think. Angela wants to use GLMM in her study regarding pets and asthma. I can ask her to give you more context in which she want’s to use the function in DataSHIELD?
Sorry I should have been more specific. I actually was asking @bono for more information about his way of doing GLMM as that might give LifeCycle a solution now, rather than waiting for me to do the study-level meta analysis (SLMA) version.
But, it would also be helpful to get more information about the planned analyses, in case we need to press on with the SLMA version.
Actually while I am here, we also would like to have Cox regression via DataSHIELD, which is mentioned in @bono `s post. Perhaps this is already possible?
yes it is possible to approximate (frailty) Cox with the
simulation-based method by now. Unfortunately we are a bit behind
with documentation and paper submission. We have a submitted paper
on a GLMM Binomial example, whose results I showed on the last DS
workshop. I attach abstract below
@tombishop : I'm avaialble for any inquiry. I can even meet you
on skype next week for a more direct exchange, if this does not
break any forum rule.
Abstract
Title: Recovery of IPD inferences from key IPD summaries only:
Applications within DataShield
Currently just some Individual Participant Data (IPD) statistical
models, like
standard GLM regressions, can be performed within DataShield while
protecting
privacy. For example, reproduction of an IPD random-effects
regression from
anonymous IPD summaries can be already challenging here. Our goal
is to
extend the inferential capabilities within DataShield. To this end
we propose a
method to reconstruct unavailable original IPD and IPD inferences
from
empirical IPD marginal moments and correlation matrix only.
The approach is rather generic and is based on a copula inversion
technique
where IPD marginal and dependence modeling is guided by the above
summaries. Through practical examples we show our method can well
recover
fixed and latent effect estimates of an IPD multi-variate Logistic
regression,
suggesting new applications within DataShield are possible.
@demetris.avraam the code can be definitely be
made more DS friendly, including for instance calls to
automatically retrieve the needed IPD summaries. However note the
code needs only to sit on the central analysis computer hence is
perfectly functional right now. Feel free to contact me for any
inquiry.
I agree that the metafor package does not require any DataSHIELD development because you run it on the client side. Also, I think lme4 (which is for mixed models) is covered in the requirement for GLMM
Thanks @bono, I will have a look in your code and we can discuss it at some point for more details. At the moment, I’m trying to develop a ds.glmm function based on Elinor’s approach.
About Cox regression:
@Gijs and Frank, from the Netherlands Comprehensive Cancer Organisation, have developed an algorithm for Cox Proportional Hazards in R for their infrastructure (https://github.com/IKNL/dcoxph) and we believe that their algorithm can be also implemented in DataSHIELD. You can see more details in this paper: https://www.ncbi.nlm.nih.gov/pubmed/26159465. Also @Gijs will give a talk in the DataSHIELD Workshop.
Hello all, we have the cox PH algorithm implemented, but it runs on our Docker-based infrastructure. I hope that we can discuss how to bring this algorithm to dataSHIELD during the September workshop. @bono, if you need more info (or if you can’t wait), please reach out!