Question regarding ds.quantileMean

daniela.zoeller · 15 July 2019 14:36

Hi,

We encountered a problem when using ds.quantileMean(). We have three cohorts with sizes of about 2000, 2000, and 15000, respectively. When we use ds.quantileMean with and without type=“split”, we got strangely high values for the combined method. In fact, the combined 75% quantile is larger than the single 95% quantiles. The mean values seem to be right, and the combined mean is lower than the combined 5% quantile.

We have no access to the individual level data (which is why we are using DS ), but we tried simulating data. Here, the combined values are close to the large cohort (as expected).

We have a lot of 0 valued variables in the two smaller cohorts (in one cohort, the 5%, 10%, and 25% quantile are 0, in the other the 5%, and 10% quantile), but in the large cohort the 5% quantile is >0. Can this be the source of our problem?

Best wishes, Daniela

demetris.avraam · 15 July 2019 16:50

Hi Daniela,

There was a tiny bug on the ds.quantileMean() function on the way that the function was dealing with missing values. The new version of DS will include the corrected version. In the meantime, and as the issue is related only with the clientside function, you can run the script from the following link https://github.com/datashield/dsBaseClient/blob/master/R/ds.quantileMean.R in your client and then use the function as usuall. The difference with the released version is an addition in line 84 of the code. I expect that this will give you the correct results but let me know if you get anything unexpected.

Many thanks, Demetris

daniela.zoeller · 16 July 2019 09:52

Hi Demetris,

thank you very much! We will try this and get back to you if it doesn’t work!

Best wishes, Daniela

demetris.avraam · 6 August 2019 13:15

Hi @daniela.zoeller. I forgot to mentioned in my previous reply that for ‘combined’ analysis the function returns the weighted average of quantiles (which is at the moment our best approximation). What we want to develop in the nearest future, is an encryption-decryption algorithm that will be able to rank the elements of a variable from multiple sources and then calculate its actual quantiles.

daniela.zoeller · 7 August 2019 13:52

Thanks for letting us know! We will keep this in mind when we use this function.

Topic		Replies	Views
Cumulation of rank data (like quantiles and mean) Developer support	1	318	15 June 2021
Ds.cov - Privacy concerns Developer support	2	443	26 May 2020
Bad performance of covariance Analyst Support	3	583	15 April 2019
Calculation of grand median in FA Beginner Support	5	866	10 December 2020
Ds.cut - Numeric value to factor with intervals Developer support	4	409	29 April 2021

Question regarding ds.quantileMean

Related topics