Using specific (conda) environment when running functions on the server-side

KalvinCheung · 21 July 2020 09:55

Hi,

Currently we need to run tensorflow using datashield.

Generally the procedure would be 1) set up the conda enviroment that contained python and tensorflow; 2) install tensorflow R package in R; 3) Load this conda enviroment in R; 4)Run tensorflow commands

So I’m wondering whether it is possible to do so with datashield, e.g. use specific conda enviroment when running functions on the server-side? and How? Thanks a lot.

Best regards, K C

patricia.ryser-welch · 21 July 2020 15:08

Hi Kalvin,

That is a very ambitious and exciting endeavour. We would need to help you with the settings all the disclosure control related to the deep machine learning techniques. Please, let me know when you are ready with that element.

Best wishes,

Patricia

https://www.datashield.ac.uk

yannick · 21 July 2020 20:42

Hi,

Looks like a good candidate for using resources How will you handle the datasets ? Are these stored in CSV files or TFRecords ones ?

I have made a bit of technology review and there is a good integration with R that is supported by Rstudio: https://tensorflow.rstudio.com/ (it uses the reticulate package that is a bridge between the R and Python worlds). Will you use this tensorflow R package ?

As there seems to be quite a lot of dependencies, I think that building a specific R server docker image, including all the Tensorflow related stuff, would help. You can use the obiba/opal-rserver as the base image.

Regards
Yannick

KalvinCheung · 22 July 2020 09:32

Hi Yannick, Thanks for the reply.

For the datasets, we might store them in CSV files.
Yes. I use the tensorflow R package.
Can I ask if I have installed datashield+opal with RPM package instead of using docker image, is that any way to load specific conda environment when running functions on the server side? Or if I installed all the dependencies in the system environment instead of in a conda environment, will it use the system environment directly when running commands on the server side?

Thanks a lot. Best regards, Kalvin

patricia.ryser-welch · 22 July 2020 10:53

Hi,

We need to discuss disclosure controls and how we could integrate it within our client server development. We will discuss this on Monday. P.

yannick · 22 July 2020 11:35

Hi Kalvin,

Are these CSV files big ? I mean big enough that data import in the database and assignment in the R session takes too much time. If this is the case then using the resources concept would be appropriate.
Are you using tfdatasets package as well? In which case a “resource” could wrap this.
The advantage of using docker for the R server is that you could prepare a fully operational R environment without polluting the hosting system, but that is not a requirement, you can keep your RPM install. Regarding the conda environment, if it is the same for every R server sessions, you can set it up in the Rprofile.R file as described in the doc; if the conda set up is a runtime choice of the DS user, you can declare a DS “aggregation” function in Opal for that purpose.

Regards
Yannick

KalvinCheung · 23 July 2020 08:35

Hi,

Thanks for the reply. Yes. See you on Monday.

Best regards, Kalvin

Topic		Replies	Views
DataSHIELD Resources New functionality	18	2140	21 October 2020
What next after creating own custom function in DataSHIELD? Beginner Support	12	759	3 January 2023
Installation query and test data Beginner Support	10	658	15 November 2021
Installation of R/dsBase package via OPAL admin GUI failed Beginner Support	7	254	2 February 2023
Opal 4.2 released Releases	0	385	20 July 2021

Using specific (conda) environment when running functions on the server-side

Related topics