Hi all,
I started to work on transferring the machine learning algorithms in RMTL to datashield framework. I would like to implement a LASSO algorithm as an example. For this I need to learn how datashield transfer the intermediate-result in each iteration (i.e. fisher matrix in glm). In addition, I need to know how to install the new function for testing.
Anyone can show me the related tutorial and information? Very appreciated.
Regards,
Hank
@demetris.avraam can you send some info please to Hank.
Hi Hank,
In the following two links you can find information on how to develop and test functions in DataSHIELD:
https://data2knowledge.atlassian.net/wiki/spaces/DSDEV/pages/658505761/Testing
https://data2knowledge.atlassian.net/wiki/spaces/DSDEV/pages/12943455/Tutorial+for+developers
Unfortunately, we haven’t updated the “tutorial for developers” for a while and you might therefore find some parts related to previous versions of DataSHIELD. If you have any specific questions during your developments and there is not any related information in the wiki you can post here and we will try to help you.
Also the following two links is the code of the client-side ds.glm function and the second server-side glmDS2 which updates and transfers the information matrix and the score vector at each iteration:
#' @title ds.glm calling glmDS1, glmDS2
#' @description Fits a generalized linear model (glm) on data from a single or multiple sources
#' @details Fits a glm on data from a single source or from multiple sources. In the latter case
#' the data are co-analysed (when using ds.glm) by using an approach that is mathematically
#' equivalent to placing all individual-level
#' data from all sources in one central warehouse and analysing those data using the conventional
#' glm() function in R. In this situation marked heterogeneity between sources should be corrected
#' (where possible) with fixed effects. e.g. if each study in a (binary) logistic regression
#' analysis has an independent intercept, it is equivalent to allowing each study to have a
#' different baseline risk of disease. This may also be viewed as being an IP (individual person)
#' meta-analysis with fixed effects.
#'
#' Privacy protected iterative fitting of a glm is explained here:
#'
#' (1) Begin with a guess for the coefficient vector to start iteration 1 (let's call it
#' beta.vector[1]). Using beta.vector[1], run iteration 1 with each source
#' calculating the resultant score vector (and information matrix) generated
#' by its data - given beta.vector[1] -
#' as the sum of the score vector components (and the sum of the components of the
#' information matrix) derived from each individual data record in that source. NB in most models
This file has been truncated. show original
#'
#' @title glmDS2 called by ds.glm
#' @description This is the second serverside aggregate function called by ds.glm.
#' @details It is an aggregate function that uses the model structure and starting
#' beta.vector constructed by glmDS1 to iteratively fit the generalized linear model
#' that has been specified. The function glmDS2 also carries out a series of disclosure
#' checks and if the arguments or data fail any of those tests,
#' model construction is blocked and an appropriate serverside error message is
#' created and returned to ds.glm on the clientside.
#' For more details please see the extensive header for ds.glm.
#' @param formula a glm() formula consistent with R syntax eg U~x+y+Z to regress
#' variables U on x, y and Z
#' @param family a glm() family consistent with R syntax eg "gaussian", "poisson",
#' "binomial"
#' @param beta.vect a numeric vector created by the clientside function specifying the
#' vector of regression coefficients at the current iteration
#' @param offset an optional variable providing a regression offset
#' @param weights an optional variable providing regression weights
#' @param dataName an optional character string specifying a data.frame object holding
#' the data to be analysed under the specified model same
This file has been truncated. show original