How should we name our community?

Recently at various meetings there has been some confusion about the name “DataSHIELD”: is it only the R software used for federated analysis, or does it also include the whole community of software and processes that are required for working with federated data in a non-disclosive way (e.g. including software like Molgenis/Armadillo, the OBiBa suite, Coral, etc, but also the processes we use to communicate this way of working to governance/ethics groups)?

So far this has come up at the Full Stack technical meetings, the ongoing community governance workshops, and the recent EUCAN-Connect assembly, with people agreeing that there is a need for clarification so as to be able to distinguish the R packages from the overall community .

This post is being made on behalf of the technical group who met in the recent Full Stack meeting to request input into this issue from across the community:

  • Do you have a suggestion for a new name for the community?

  • Or, do you have alternate ideas for how to distinguish the R packages from the overall community?

Please use this thread for discussion (as a reminder, you can turn email notifications on or off by selecting from the drop down menu at the bottom of the thread).

We will continue to discuss this topic over the next few months. And, perhaps it will be possible for us to reach agreement by the time of the conference in Barcelona :slight_smile:

Thanks for opening this thread, Tom!

Indeed, while working on the community governance documents, there have been quite some discussion when using the word “DataSHIELD” as multiple people infer different concepts/structures behind it.

From my point of view, I have always described “DataSHIELD” as a series of R packages, and thus, would stick with the term “DataSHIELD” for those R packages. Personally, I think that the new term name should not be too divergent from “DataSHIELD” itself though as much of the software is associated with the R packages.

My ideas for a term representing the whole community of software related to DataSHIELD would be:


What do others think?

It’s nice that we have grown so much that we have this problem :slight_smile:

In my mind “DataSHIELD” is the core R packages. I do see the confusion around the other tools in the tool box though.

I guess it depends on who actually belongs to the community at present - does everyone either use or facilitate the use of the DataSHIELD R packages?

Maybe DataSHIELD R packages and DataSHIELD users existing in the DataSHIELD ecosystem?


Here are my definitions:

  • For me “DataSHIELD” is more a “method”: making distributed privacy preserving computations. DataSHIELD results validity can be proved mathematically. The R packages, and the computation nodes infrastructure (Opal etc) are just implementation details. One could imagine a Python based DataSHIELD implementation for instance.

  • The “DataSHIELD community” is the group of users/developers/ethics experts etc. that are making DataSHIELD a reality.

  • The “DataSHIELD ecosystem” is the toolbox (R packages, Opal, resources, docker images, tutorials etc).


I agree with Yannick’s point about DataSHIELD representing a method. If there was a new development of the same analysis tool in Python, my instinct is that it would be an example of DataSHIELD too.

However I have a thought experiment dilemma; what if Professor X of Consortium Y took it upon themselves to code it all up into Python, then the Python package took off and soon had 3x the monthly users as the R version. If they were the bigger of the two languages of DataSHIELD, could they call themselves the “leader” and subvert Becca’s role as current PI? If there was a disagreement over, say, a new disclosure method which the R team wanted to implement but Professor X didn’t want to implement for Python DataSHIELD, would they both remain DataSHIELD?

I’m not sure about the above, and not sure if it’s an example in favour of creating a new name for the wider “method”?

I respectfully request that these discussions take a pause. The one person who might have an interest in contributing, as initiator of the DataSHIELD method/ecosystem, is entirely excluded because he is in a hospital fighting for his life. This is not hyperbole; this is Paul’s reality. Please pay him the respect that he is due and allow the time needed to enable him to engage. Facilitating broad and appropriate engagement is surely the very essence of a collaborative and democratic community process.

Yannick’s comments are a solid foundation for an interregnum.

Madeleine (DataSHIELD ethics and governance advisor)

Prof Madeleine Murtagh Chair of Social Data Science University Of Glasgow

An excellent description

Prof Madeleine Murtagh Chair of Social Data Science University Of Glasgow

Dear @Madeleine_Murtagh,

I am very sorry to hear that Paul is in hospital, I hope he responds well (and quickly) to treatment. Please do pass on my best wishes for a rapid recovery and let him know that I am thinking about him - indeed, I’m sure I can speak for many here and say that we are all concerned and thinking about him, and wish him the very best.

As @tombishop mentioned in the first post, this is a question that has come up repeatedly in multiple fora - including at meetings where Paul was in attendance (indeed, Paul also acknowledged at several of the recent governance workshops that clarifying names was a very important thing and something that we need to do quite urgently). Furthermore, the discussion will be going on for at least the next couple of months: the suggestion above seems to be that no firm decisions are made before the conference in October. I am not sure therefore that “pausing” the discussion is helpful (when would you like it to restart?), particularly as there may well be natural pauses as people take holidays over the summer. And, of course, Paul’s is definitely a voice that we want to be included in the discussion, so I for one hope that it will be possible for him to participate in the discussion again very soon.

Best wishes,