Implementing a ROC function

I’m currently trying to implement a ROC curve function which is run on the server and sends the pairs of specificity and sensitivity to the client which then plots all curves into the same plot.

Client: dsBaseClient/ds.roc.R at master · rapus95/dsBaseClient · GitHub

Server: dsBase/rocDS.R at master · rapus95/dsBase · GitHub

Given that this is my first encounter and contribution to both R and DataSHIELD, I’d be happy about all feedback I get :smiley: thus, feel free to have a look at it (the code is relatively easy/short) and give me some input :see_no_evil:

I’d also like some guidance for proper testing of that function :see_no_evil:

Greets, Aaron

Hi Aaron, Thanks. Have you considered if this method could be disclosive? Stuart

Hi Aaron,

this is a really interesting function that you’re developing within the DataSHIELD environment. Can I check in with how the development process is going for you?

  • Is there anything about ROC curves that you are finding challenging to implement?
  • Can you think of anything that would make it easier for third parties like yourself to develop code for DataSHIELD? (For instance: more wiki support in the developer section/ more detailed instructions on the process of developing non-disclosive functions/ clearer github infrastructure for datashield files?)

Well yes, thinking about whether that’s disclosive brought me to the point where I had to accept that everything no matter how aggregated it is, holds the potential to be disclosive. Thus I settled on the hope that people won’t use it to disclose data in the same way other functions (like subarray indexing) do it right now. Smoothing would help anonymizing data though.

In particular challenging to implement is the aggregation of ROC curves from different sources. In order to decrease the disclosiveness I don’t share the thresholds for each data point. But that also leads to missing comparability of the curves and thus it’s impossible to create a combined curve.

Regarding ease of use for developers, some additional tooling in regards of convenience functions would indeed be nice. Having to string-wrap each call manually as some sort of custom remote procedure call feels too close to the metal IMO. I also played with the idea whether using gRPC/Protocol Buffers as the interface could be an interesting development given that it’d also on one hand provide clear interfaces by design (design by contract) and also loosens the interface which would be interesting for me in particular since I’m currently playing around with a front-end in another language (->Julia). And having to call Julia->R DataShield client->R DataShield server → R DataShield server → Julia feels very unstable due to all the points where errors could appear :see_no_evil: But given that DataSHIELD is a very good idea that’s a price worth to pay I guess :smiley:

Hi Aaron,

Currently, only a R client can perform DataSHIELD R operations, because the transferred objects are serialized by R (and then only R can deserialize them). I am thinking of the possibility to opening Opal R API to clients in any language, using JSON as the serialization format. Method calls would still be expressed as R expressions (the DS R server is not to be replaced), but the returned result would be a JSON message (serialized by jsonlite). Aggregation results are usually simple data structures, so I assume there will be no issue with the JSON formatting. Ideally, the client helper packages such as DSI, dsBaseClient would also need to be implemented in different languages (Julia, Python probably). For the end-user, this would add a lot of flexibility in terms of setting up a DS environment.

Has anyone an opinion about having such feature?

Regards
Yannick