In some DataSHIELD functions we need to add a small random number to the outputs before we return them to the client (see for example the graphical functions). The embedded random numbers follow a normal distribution, and therefore if the seed of RNG is not fixed in a constant value, then multiple uses of the same function in a given dataset will give different results but their average will converge to the real numbers (due to the law of large numbers) thus is disclosive. To overcame this issue we must fix the seed number however we should keep it secret from the user.
I have developed a server-side function (https://github.com/datashield/dsBetaTest/blob/master/R/seedDS.o.R) that generates the seed number based on an input vector, however the way that this number is generated is not the optimal. Do you have any thoughts on how to optimise the generation of a seed number based on an input vector?
One idea is that the Opal will generate a study-specific seed number at the time when the data are uploaded in the Opal server. Another idea is that the data owner can specify the seed number as part of the data dictionary, but this number should be kept secret to any other person. @yannick do you think that any of these two options are possible?