Ds.cov - Privacy concerns

Dear all,

concerning the possibility to calculate a covariance matrix (which I like very much!), I would like to suggest an additional filter. Specifically, I am concerned that if the number of values in the covariance matrix exceeds the number of individuals we might be able to use the values for reconstruction of individual values. I see several options:

  • Number of values in covariance matrix needs to be smaller than number of individuals
  • same as the one before, but with a percentage the host can define (Number of covariance = 0.8 * number of individuals)
  • Same as the two before but based on the number of variables instead of values in covariance matrix

Best wishes, Daniela

Hi @daniela.zoeller,

Yes, that make sense. I will modify the function based on your second suggestion and I will add an extra filter specific to the covariance function (and probably the same filter for the correlation function) in the list of disclosure controls.

Also, I think we need to add another additional filter to check how many unique values exist in each variable and if those unique values are less than a threshold (that will be a percentage of the length of the variable) to block the return of the outcomes.

Hi Daniela,

Just an update on this. We have updated the ds.cov (and similarly the ds.cor) function and we now block the calculation of the covariance matrix if the number of variables is bigger than a pre-specified proportion of the number of indiiduals. To define this proportion we use the same protection filter as the one that checks for oversaturation in regression models and which is set by default to 33%. The new versions will be included in the coming release of DataSHIELD.

Thank you for the suggestion!