Implementing a ROC function

rapus95 · 2 February 2021 09:04

I’m currently trying to implement a ROC curve function which is run on the server and sends the pairs of specificity and sensitivity to the client which then plots all curves into the same plot.

Client: dsBaseClient/ds.roc.R at master · rapus95/dsBaseClient · GitHub

Server: dsBase/rocDS.R at master · rapus95/dsBase · GitHub

Given that this is my first encounter and contribution to both R and DataSHIELD, I’d be happy about all feedback I get thus, feel free to have a look at it (the code is relatively easy/short) and give me some input

I’d also like some guidance for proper testing of that function

Greets, Aaron

swheater · 3 February 2021 09:29

Hi Aaron, Thanks. Have you considered if this method could be disclosive? Stuart

alexwesterberg · 12 February 2021 10:40

Hi Aaron,

this is a really interesting function that you’re developing within the DataSHIELD environment. Can I check in with how the development process is going for you?

Is there anything about ROC curves that you are finding challenging to implement?
Can you think of anything that would make it easier for third parties like yourself to develop code for DataSHIELD? (For instance: more wiki support in the developer section/ more detailed instructions on the process of developing non-disclosive functions/ clearer github infrastructure for datashield files?)

rapus95 · 27 February 2021 14:37

Well yes, thinking about whether that’s disclosive brought me to the point where I had to accept that everything no matter how aggregated it is, holds the potential to be disclosive. Thus I settled on the hope that people won’t use it to disclose data in the same way other functions (like subarray indexing) do it right now. Smoothing would help anonymizing data though.

In particular challenging to implement is the aggregation of ROC curves from different sources. In order to decrease the disclosiveness I don’t share the thresholds for each data point. But that also leads to missing comparability of the curves and thus it’s impossible to create a combined curve.

Regarding ease of use for developers, some additional tooling in regards of convenience functions would indeed be nice. Having to string-wrap each call manually as some sort of custom remote procedure call feels too close to the metal IMO. I also played with the idea whether using gRPC/Protocol Buffers as the interface could be an interesting development given that it’d also on one hand provide clear interfaces by design (design by contract) and also loosens the interface which would be interesting for me in particular since I’m currently playing around with a front-end in another language (->Julia). And having to call Julia->R DataShield client->R DataShield server → R DataShield server → Julia feels very unstable due to all the points where errors could appear But given that DataSHIELD is a very good idea that’s a price worth to pay I guess

yannick · 28 February 2021 08:17

Hi Aaron,

Currently, only a R client can perform DataSHIELD R operations, because the transferred objects are serialized by R (and then only R can deserialize them). I am thinking of the possibility to opening Opal R API to clients in any language, using JSON as the serialization format. Method calls would still be expressed as R expressions (the DS R server is not to be replaced), but the returned result would be a JSON message (serialized by jsonlite). Aggregation results are usually simple data structures, so I assume there will be no issue with the JSON formatting. Ideally, the client helper packages such as DSI, dsBaseClient would also need to be implemented in different languages (Julia, Python probably). For the end-user, this would add a lot of flexibility in terms of setting up a DS environment.

Has anyone an opinion about having such feature?

Regards
Yannick

Topic		Replies	Views
How to send a numeric vector? creating a new function Beginner Support	2	351	12 January 2021
Is Lasso disclosive? Statistical help	6	555	3 February 2020
Where can I find a tutorial for adding a new datashield function and its installation？ Developer support	3	603	19 February 2020
Could DataShield be adapted to meet accepted SDC norms? Support	9	151	13 March 2024
What DataSHIELD functionality are you working on? New functionality under-dev	15	1489	29 May 2022

Implementing a ROC function

Related topics