Theoretical limitations to statistical modelling in Federated Analysis


I wonder what are the theoretical limitations, e.g. types of statistical models or types of data, which cannot be (re-)formulated or modelled as federated analysis? Could you share references to support these limitations (or absence of)?

Best, Wilmar


It depends on the type of federated analysis you want to use. The approaches in DataSHIELD right now can in theory be extended to all score based statistical approaches, e.g., GLM. It is not usable for non-parametric approaches easily.

If you use differential privacy, e.g., adding a noise term to the data, or secure multi party computation, everything is implementable. The drawback is, that the single-to-noise-ration gets worse or the computational effort gets really heavy.

Best, Daniela

From machine learning perspective, the method can be distributed if the iterative update can be decoupled. You might need to check “homomorphic encryption”, which is directly related to your question.


Hi all,

The models can be all implemented in privacy-preserving federated analysis. We need to be adapt our implementation for to prevent disclosure of data through inference and other counter attack. We need to adapt our implementation to work with large datasets too. For example, my work is becoming interesting with non-parametric statistics. The algorithm I use is quite efficient with small data sets; i.e., up to a third of the maximum size of a data frame in R. However, with larger datasets the scalability is affected. It is taking quite a lot of time of my thinking time at the moment. Other federated systems have the same issues with machine learning, and they do not apply privacy preserving computations.