synDS not in the server used in the tutorial for dsSynthetic?

Dear all,

I’m interested in trying out dsSynthetic following the tutorial (dsSynthetic: A DataSHIELD package to generate synthetic data). The connection to the sandbox server works fine, but when I try ds.syn I recieve the error:

datashield.errors() $server1 [1] “[Client error: (400) Bad Request] No such DataSHIELD ‘AGGREGATE’ method with name: synDS”

Checking the available methods by

datashield.methods(connections, type=“aggregate”)

the only available methods seems to be from dsBase, version 6.3? Am I missing something or do the server need to be updated to allow synDS?

Kind regards, Bodil

Hi Bodil,

I’m pleased that you want to try out dsSynthetic! I can see that the package is not there on the sandbox server and will try and put that right this morning.

I’d be interested to hear how you get on. I suspect the problem we might run into is that it is easy to overwhelm the server when requesting a synthetic dataset as it requires a lot of processing power for larger datasets.

Tom

Hi Bodil,

I have made some changes and now the dsSynthetic package should be there. Do let me know if there are any other issues

Tom

Thank you Tom for your rapid response!

I’ve so far checked that I was able to get a local synthetic data set (10 000 observations, 10 variables) following section 3.1-3.2 in the tutorial. I use seed=123 in the ds.syn call as in the tutorial, but head(synth_data) does not give exactly the same as in the tutorial. However, redoing the call with the same seed gives an identical data set to the first one I got and the mean of e.g. LAB_TCS in the real data set on the server and in the synthetic data is very close (5.860897 vs. 5.865847), so I’m quite happy! The slight difference between the synthetic data in the tutorial and the one I got might be because of a newer version of R and/or dsBase and I would not bother too much about it.

I’ll continue testing in the next couple of days!

KR Bodil