Hi datashield members:
Since we were dealing with high-dimentional data (n<<p) i.e. omics data, therefore we were learning to implement the common machine learning tool —Lasso and Ridge regression— in datashield. And the implemented method must be non-disclosive.
However, during the study, we found the statement in the Page: “…This disclosure filter protects against fitting overly saturated models which can be disclosive. The choice of 0.37 is entirely arbitrary…”
Can anyone explain why the saturated model is disclosive? How to identify the individual information from the saturated model? It is important to us since LASSO is the saturated sometimes (RIDGE model must be saturated), is there any possibility to avoid the disclosure for machine learning?