Seed for random number generation (RNG)

demetris.avraam · 20 June 2019 10:24

In some DataSHIELD functions we need to add a small random number to the outputs before we return them to the client (see for example the graphical functions). The embedded random numbers follow a normal distribution, and therefore if the seed of RNG is not fixed in a constant value, then multiple uses of the same function in a given dataset will give different results but their average will converge to the real numbers (due to the law of large numbers) thus is disclosive. To overcame this issue we must fix the seed number however we should keep it secret from the user.

I have developed a server-side function (https://github.com/datashield/dsBetaTest/blob/master/R/seedDS.o.R) that generates the seed number based on an input vector, however the way that this number is generated is not the optimal. Do you have any thoughts on how to optimise the generation of a seed number based on an input vector?

One idea is that the Opal will generate a study-specific seed number at the time when the data are uploaded in the Opal server. Another idea is that the data owner can specify the seed number as part of the data dictionary, but this number should be kept secret to any other person. @yannick do you think that any of these two options are possible?

yannick · 20 June 2019 11:28

Sure, there is no problem for Opal (or whatever data repository is used (think DSI)) to provide a (server instance specific) seed number at Datashield R session creation. This can be part of the specifications of a Datashield-compatible data repository. What would be the scope of this seed number? Simply study-specific or R session specific, etc.?

Yannick

demetris.avraam · 20 June 2019 12:15

Great! This should be study-specific, so any user that analyse data from a given study to get always the same answer as any other user who does the same analysis on the same data.

yannick · 20 June 2019 12:34

Ok, so this is the right time to do it as there is a opal release coming soon.

Would a R option datashield.seed work for you? In your R code server side you would get the seed number value with getOption("datashield.seed"). Opal will ensure the seed is always the same.

Is there a preferred range of values for the seed number?

demetris.avraam · 20 June 2019 13:00

Yes, an R option datashield.seed would work. The seed can be any number, we don’t have a preferred range. The data owner can specify this number for their study. However, we want this number to be hidden from any user. We don’t want this to be visible in the Opal Interface like the other R options neither to be visible in any package’s DICTIONARY.

yannick · 20 June 2019 13:31

Opal has already a secret key which is instance specific, generated at first run. We can have a seed number as well, hidden from the users.

The opal feature request issue.

Yannick

demetris.avraam · 20 June 2019 13:34

Great! Thanks Yannick.

yannick · 20 June 2019 15:53

What about the case datasets from different studies are hosted on the same Opal? They could even be in the same project. The seed cannot be dataset name/path dependent because one could assign several tables during the same datashield session. I am a bit lost with your need.

demetris.avraam · 21 June 2019 14:49

Hi @yannick. Yes its ok for different studies that are hosted on the same Opal to share the same seed.

yannick · 2 July 2019 14:25

Hi,

The datashield.seed R option is available in Opal 2.14 which has been released yesterday.

Cheers

Yannick

demetris.avraam · 4 July 2019 11:37

Thanks Yannick. I will try to use the generated seed and I will give you any feedback.

Best, Demetris

Topic		Replies	Views
What next after creating own custom function in DataSHIELD? Beginner Support	12	759	3 January 2023
What DataSHIELD functionality are you working on? New functionality under-dev	15	1494	29 May 2022
Installation query and test data Beginner Support	10	658	15 November 2021
Is it possible to switch the context "R" and "Datashield" Developer support	2	331	2 December 2020
IPD Data disclosed on OBIBA demo server? Beginner Support	1	200	27 January 2023

Seed for random number generation (RNG)

Related topics