Using specific (conda) environment when running functions on the server-side

Hi,

Currently we need to run tensorflow using datashield.

Generally the procedure would be 1) set up the conda enviroment that contained python and tensorflow; 2) install tensorflow R package in R; 3) Load this conda enviroment in R; 4)Run tensorflow commands

So I’m wondering whether it is possible to do so with datashield, e.g. use specific conda enviroment when running functions on the server-side? and How? Thanks a lot.

Best regards, K C

Hi Kalvin,

That is a very ambitious and exciting endeavour. We would need to help you with the settings all the disclosure control related to the deep machine learning techniques. Please, let me know when you are ready with that element.

Best wishes,

Patricia

https://www.datashield.ac.uk

Hi,

Looks like a good candidate for using resources :slight_smile: How will you handle the datasets ? Are these stored in CSV files or TFRecords ones ?

I have made a bit of technology review and there is a good integration with R that is supported by Rstudio: https://tensorflow.rstudio.com/ (it uses the reticulate package that is a bridge between the R and Python worlds). Will you use this tensorflow R package ?

As there seems to be quite a lot of dependencies, I think that building a specific R server docker image, including all the Tensorflow related stuff, would help. You can use the obiba/opal-rserver as the base image.

Regards
Yannick

Hi Yannick, Thanks for the reply.

  1. For the datasets, we might store them in CSV files.
  2. Yes. I use the tensorflow R package.
  3. Can I ask if I have installed datashield+opal with RPM package instead of using docker image, is that any way to load specific conda environment when running functions on the server side? Or if I installed all the dependencies in the system environment instead of in a conda environment, will it use the system environment directly when running commands on the server side?

Thanks a lot. Best regards, Kalvin

Hi,

We need to discuss disclosure controls and how we could integrate it within our client server development. We will discuss this on Monday. :slight_smile: P.

Hi Kalvin,

  1. Are these CSV files big ? I mean big enough that data import in the database and assignment in the R session takes too much time. If this is the case then using the resources concept would be appropriate.

  2. Are you using tfdatasets package as well? In which case a “resource” could wrap this.

  3. The advantage of using docker for the R server is that you could prepare a fully operational R environment without polluting the hosting system, but that is not a requirement, you can keep your RPM install. Regarding the conda environment, if it is the same for every R server sessions, you can set it up in the Rprofile.R file as described in the doc; if the conda set up is a runtime choice of the DS user, you can declare a DS “aggregation” function in Opal for that purpose.

Regards
Yannick

Hi,

Thanks for the reply. Yes. See you on Monday.

Best regards, Kalvin