Hello,
we are starting to put together the v6.2 release of DataSHIELD, the current release notes are below. We are keen to hear any comments and feed-back.
Stuart
Draft DataSHIELD Release Notes v6.2
Focus of Release
The changes in the v6.2 release of DataSHIELD are mainly focuses on the enhancing of disclosure controls available to data owners, also additional analytical and presentation methods for data analysis.
Changes from DataSHIELD v6.1.1 to v6.2
Checking Permissive PrivacyControlLevel
To support data owners who have particularly sensitive data, additional disclosure protection has been added to v6.2 release. These changes permit a data owner to place a service into “Permissive” (default) or “non-Permissive” disclosure mode. This is done by setting the “datashield.privacyControlLevel” option. The service will be in “permissive” mode if the “datashield.privacyControlLevel” option has the value “permissive”, any other value will cause the service to be in “non-permissive” mode.
If a service is in “non-permissive” mode will cause certain methods to be blocked from being invoked by the client. The list of blocked methods are:
dataFrameSubsetDS1 | rbindDS |
---|---|
levelsDS | recodeLevelsDS |
cDS | recodeValuesDS |
cbindDS | repDS |
dataFrameDS | reShapeDS |
dataFrameSortDS | seqDS |
dataFrameSubsetDS2 | subsetByClassDS |
dmtC2SDS | subsetDS |
In addition, the method aliases for ‘base::c’, ‘base::cbind’ and ‘base::rep’ have been removed.
Not having access to these methods will mean that the Data Owner will be required to perform more data shaping for the Data User(s).
Changing disclosure settings
In this release, there are new disclosure settings data owners can specify. The new “default.nfilter.levels.density” and “default.nfilter.levels.max” has been added, with default level equal to 0.33 and 40 respectively. These options are described on the page wiki page - https://data2knowledge.atlassian.net/wiki/x/DoCaKg
New Functions
The following functions have been added to the version 6.2 of DataSHIELD dsBaseClient package.
ds.hetcor: computes a heterogenous correlation matrix, consisting of Pearson product-moment correlations between numeric variables, polyserial correlations between numeric and ordinal variables, and polychoric correlations between ordinal variables.
ds.lspline: computes the basis of piecewise-linear spline such that, depending on the argument “marginal”, the coefficients can be interpreted as (1) slopes of consecutive spline segments, or (2) slope change at consecutive knots. This is an assign function which saves the created object on the serverside.
ds.qlspline: this is similar to ds.lspline but it calculates the knot positions to be at quantiles of the input variable.
ds.elspline: this is similar to ds.lspline but it calculates the knot positions such that they cut the range of the input variable into n equal-width intervals.
ds.ns: generates a basis matrix for representing the family of piecewise-cubic splines with a specified sequence of interior knots, and natural boundary conditions. This is an assign function which saves the created object on the serverside.
ds.dmtC2S: supports the need to be able to transfer complex variables for the client-site to the server-side(s). This is an assign type method. The types of variables which can be transferred are data.frame, matrix or tibble.
ds.asFactorSimple: converts an input variable into a factor. Unlike ds.asFactor and its serverside functions, ds.asFactorSimple does no more than coerce the class of a variable to factor in each study. It does not check for or enforce consistency of factor levels across sources or allow you to force an arbitrary set of levels unless those levels actually exist in the sources. In addition, it does not allow you to create an array of binary dummy variables that is equivalent to a factor. If you need to do any of these things, you will have to use the ds.asFactor function.
ds.metadata: obtains the non-disclosive metadata associated with a variable held on the server.
ds.ranksSecure: securely generate the ranks of a numeric vector and estimate true global quantiles across all data sources simultaneously (see https://data2knowledge.atlassian.net/wiki/x/AYDPog for retails)
ds.unique: generate a variable on the server-side which represents a version of an existing variable but without any duplicate values.
ds.forestplot: draws a forestplot of the coefficients for Study-Level Meta-Analysis (*)
(*) Provided by Xavier Escribà Montagut, Barcelona Institute of Global Health (ISGlobal), Spain
Changed Functions
ds.replaceNA: This new version of ds.replaceNA can replace NAs in factor variables. The replaced values are then considered as additional levels of the factor.
ds.tapply.assign: Major refactoring which ensures that variables are present in all servers. fixed an issue to deal correctly with variables that include missing values and not only complete cases.
ds.tapply: Major refactoring which ensures that variables are present in all servers, fixed a issue to deal correctly with variables that include missing values and not only complete cases.
ds.mean: the behavior if all values are NAs has been changed; if ds.mean is call on a vector, on a server, which only contains NAs, the result from the server will be NA, instead of causing a disclosure block.
ds.var: the behavior if all values are NAs has been changed; if ds.var is call on a vector, on a server, which only contains NAs, the result from the server will be NA, instead of causing a disclosure block.
ds.table: The new version allows the user to specify only two options for the argument useNA either “no” or “always”. The option “ifany” which was available in v6.1.1, is not allowed any more.
ds.corTest: The new version allows the user to get Kendall’s tau or Spearman’s rho correlation coefficient for a pair of variables, in addition to the existing Pearson’s correlation. The new arguments added are: the method which can be one of “pearson” (default), “kendall”, or “spearman”, the exact which is a logical indicating whether an exact p-value should be computed for Kendall’s tau or Spearman’s rho, the conf.level which defines the level of the returned confidence interval, and the type which defines if a study-specific correlation coefficient is returned or a combined correlation across all studies (the combined correlation is an approximation of the exact pooled correlation and is estimated based on Fisher’s z transformation).
ds.glmSLMA, ds.lmerSLMA and ds.glmerSLMA: the changes to these functions are as follows:
- we made sure that the grouping factor (i.e. the variable after the “|”) in the mixed model is not included in a set of checks that are normally used for standard GLMs. This is not appropriate as it blocked users from running models when there were small number of individuals in the groups (e.g. siblings in family groups). Having a small number of individuals in a group is not a disclosure issue for mixed models and hence it should be permitted.
- we improved the handling of errors when something went wrong in the underlying lme4 functions that are used. Previously this meant that the error message returned to the user was not the one from the underlying function, making it hard to debug what has gone wrong.
- we have added, to ds.glmSLMA, a notify.of.progress argument which can enable or disable logging to progress.
ds.histogram: function allows the user to plot distinct histograms (one for each study) or a combined histogram that merges the single plots.
ds.Boole: an issue was fixed which means that under certain circumstances incorrect results can be produced. This incorrect behaviour can occur if the right-hand operand is negative.
ds.asNumeric: has been changed to deal with different types of variables (including characters)
Client-side Testing Infrastructure
Additional tests, and general test improvements are included in this release.
Addition of testing within client methods of existence of variable and class being used.
Server-side Testing Infrastructure
Additional tests, general test improvements, added privacy control level tests and improved error messages are included in this release.
Backward compatibility with v6.1.1 dsBaseClient
There are no known significant issues with using v6.1.1 dsBaseClient with v6.2 dsBase. The changed in behaviour which have been observed are limited to changes to the text of error messages, changes to the circumstances under which a disclosure block could occur and bug fixes.
Supported Versions
DataSHIELD v6.2 is supported on R3.5, R3.6, R 4.0 and R4.1, and would be expected to work with intermediate versions. At present the DataSHIELD client-side package is known to work on Ubuntu 18.04, Ubuntu 20.04, Windows 10 and macOS Big Sur (11.6). DataSHIELD server-side package is known to work when deployed to Opal 4.3.3 running on Ubuntu 18.04 and 20.04.
Code Availability
(Planned) As ever, you can obtain the code at a variety of places:
- DataSHIELD’s CRAN https://cran.datashield.org/
- Obiba’s CRAN https://cran.obiba.org/
- GitHub - datashield/dsBase in 6.2 tag
- GitHub - datashield/dsBaseClient in 6.2 tag