Bad performance of covariance

patrick.fischer · 4 April 2019 11:16

Hi,

my name is Patrick and I’m working with DataSHIELD as part of the MIRACUM research project. I’m currently trying to use PCA in combination with DataSHIELD, for which I obviously need the covariance/correlation matrix of my variables. However, I tried to compute a covariance matrix of 10000 variable dataset with both the old covariance function and the new function of the dsBetaTest package. I adjusted the first function from dsStats locally to have better performance (https://github.com/datashield/dsStats/pull/3). While trying the new function in the betaTest package, my server (64 GB of RAM) quickly ran out of memory. Are there any intentions to improve the perfomance of these functions by the developer group? Otherwise I would love to contribute on these functions to make it more feasible for larger datasets.

Best regards, Patrick

demetris.avraam · 4 April 2019 12:01

Hi Patrick,

It will be great if you can contribute on optimizing the performance of those functions. Just to let you know that the previous versions of ds.cov and ds.cor that are included in the dsStatsClient package can calculate only the covariance and the correlation matrices for each single study separately. The new versions that are included in the dsBetaTestClient package are able to calculate the matrices for each single study plus the combined matrices for the pooled analysis.

patrick.fischer · 8 April 2019 12:57

Hi Demetris,

I already made good progress in improving this function. However, I’m still puzzling on how to test this function on my already set up server. Do you have any advice for this?

demetris.avraam · 15 April 2019 09:49

Hi Patrick,

To test your development you can either use the new DataSHIELD Interface (see the discusssion here: Datashield R API) or the “traditional” way by uploading the function as a script in your server (you can find a lot of information in the DataSHIELD wiki: https://data2knowledge.atlassian.net/wiki/spaces/DSDEV/pages/12943448/Notes+for+developers). Otherwise, we can arrange a skype call where I can show you how you can test the function.

Topic		Replies	Views
Ds.cov - Privacy concerns Developer support	2	443	26 May 2020
Planned DataSHIELD v6.1 release Releases	0	383	5 October 2020
An error when applying glm in Foreach and doParallel Analyst Support	7	1811	2 February 2020
DataSHIELD teleconference - DSI/Resources/dsOmics New functionality	7	731	2 March 2020
What is this space? Beginner Support	3	475	13 April 2021

Bad performance of covariance

Related topics