Troubleshooting Connectivity Issues in DataSHIELD Analysis

Dear all,

We have been running DataSHIELD analysis on 8 studies for some time now with no significant issues. However, the last couple of weeks we started experiencing connectivity issues, where certain studies would take longer to log into, assign data or derive variables, which made it impossible to proceed with the analysis. Every time, there was a different time-out error.

In the meantime, we were investigating the issue from all ends (Analysis Server and individual Opal servers) with not much success. Upon trying today, it seemed that the issue had been resolved. But, we don’t know what caused it.

Do you have any hints on how to proceed if we encounter that issue again? Is there a way to produce more logs? Your help is greatly appreciated.

Best regards,

Sofia

Hi Sofia,

I agree it can sometimes be difficult to know what went wrong. Are you able to provide any more background details such as:

  1. The size of the data sets being analysed
  2. Number of users, and whether they would all be running analyses at the same time
  3. Specification of the machines hosting Opal/R (memory, number of processors)
  4. Any more details of the error messages

Tom

Hi Tom,

Thank you for the quick response!

  1. The datasets are of varying size, ranging approximately from 60 KB to 1 MB .

  2. We have two users, and their analysis tasks were not running concurrently.

We thought that the issue could relate to the new version of R (4.3.1) on the client side but now it seems to be working fine with that version.

  1. Unfortunately, I don’t have all the details about the hosting machines. Do you happen to know if there’s a way to retrieve the R versions? Of course, I am happy to ask the hosting institutes but perhaps there is another way? The Opal servers are > 4.5.

  2. Here are a couple of the errors:

Thank you for your insights and help!

Sofia

Hi,

It is a network failure. Your network connection is most likely faulty at times.

By the way, you can see the R version, the number of active Datashield sessions, and the amount of free memory on the R server by going to the Opal Administration > R page.

Regards
Yannick

Hi Yannick,

Thank you for your reply. Is there a way to retrieve information on hosting institutes when you are not the Opal Administrator and just have analysis permissions (view dictionary and summaries)? I can get the Opal version information when running DSI::datashield.connections() but not the R version.

Many thanks,

Sofia

That is right that the R version would be useful (and non-disclosive) information for the analyst. It could be part of the connection info, just like the Opal version as you mentionned. I’ll add it.

In the meantime you can add a new Datashield aggregate method to the Datashield profile: you can allow function R.Version that maps to base::R.Version.

Then, as an analyst, you will be allowed to run R.Version like this:

> datashield.aggregate(conns, "R.Version()")
  Aggregated (...) [=====================================================================] 100% / 0s
$study1
$study1$platform
[1] "x86_64-pc-linux-gnu"

$study1$arch
[1] "x86_64"

$study1$os
[1] "linux-gnu"

$study1$system
[1] "x86_64, linux-gnu"

$study1$status
[1] ""

$study1$major
[1] "4"

$study1$minor
[1] "3.1"

$study1$year
[1] "2023"

$study1$month
[1] "06"

$study1$day
[1] "16"

$study1$`svn rev`
[1] "84548"

$study1$language
[1] "R"

$study1$version.string
[1] "R version 4.3.1 (2023-06-16)"

$study1$nickname
[1] "Beagle Scouts"

Regards
Yannick