Using Resources in server functions. Can we have some example code please

Hi all,

Before resources Datashield server function either uses a data frame store on the server with a given symbol or some server variables. How differently do we need to code our Datashield server functions to accommodate the streaming of the data. In other system, I may use database connection tools, such JDBC. With large data files, I would use pooling techniques.

I was under the impression resources would bring this element completely transparently. Now, I am not convinced at all. Apologies for asking this dummy question. Can somebody please share an example code for developing generic general server-side functions that can use both non-resources or resources connections.

I have asked this question several times. I am sorry I have not made it clearer previously. With the niche application with Omics and the packages, I am starting to think resources cannot be applied for other application. Am I wrong?

Pat

Hi,

You are wrong :wink: Just do not expect any magic from the Resources: it is a formal way to build an access point to data or computation services, it does not assume any thing about how the data are structured, stored and can be analyzed. That is why it applies to any domain, by the way.

More specifically, if you intend to access data in a database using JDBC, you will only get a resource that is a connection to a database through JDBC… What you could do then with such resource is (1) to coerce to a data.frame (because this is the only data structure that dsBase is able to work with) or (2) to coerce to a dplyr’s tbl object (which is more suitable for delegating the work load to the database itself) or (3) to directly work with the database connection as you wish.

In addition to that, you will need to design a resource extension for working with JDBC, probably by using the RJDBC package. See the example of the odbc.resourcer package.

Regards
Yannick

I thought I was always right Yannick. Obviously not. That is good to know :stuck_out_tongue:

It is an informative answer and now I understand it a bit better. So, let deepen my understanding further (and the community too). Let’s use the method meanDS. Currently, we use a data frame and we give a symbol from the client. So we use arguments and pass name of these R object to be identified by the meanDS function on the server.

Within a scenario, a Toronto hospital uses some Cloud medical system, Newcastle the traditional way ( we are slow to adapt! :slight_smile: ), and Paris some old system that are integrated with some database connection systems (like ODBC, JDBC). Will I need to write again meanDS so that it can connect work for Toronto? Will I need to write again another meanDS for Paris?

I am trying to get my head round for a general way to code and adapt to this situation. Can you suggest a solution for my Dummy scenario. I am very slow sometimes. Sorry. :frowning:

Best wishes,

Pat

Ps: It is snowing in Newcastle today, only two inches and it is chaos. :slight_smile:

if meanDS requires a symbol that represents a data.frame, the user must coerce the previously assigned resource to a data.frame. The resourcer package comes with its own DataSHIELD assign functions to do that: coerce to a data.frame, a dplyr’s tbl or a R object.

The end-user knows whether the objects assigned in the DataSHIELD server environment is a data.frame or a resource, so s/he should know when transforming the resource connection to a tabular data format is needed.

Possible improvements to meanDS:

  • if symbol represents a resource, automatically do the coercing to a tabular representation of the data,
  • rewrite meanDS to do it the dplyr’s way, because if the resource connects to a SQL table with billions of rows, coercing to a data.frame will bring all of them in memory…

Please also have a look at this documentation: Using Resources with DataSHIELD

Regards
Yannick

That is useful. Next question.

The resourcer package comes with its own DataSHIELD assign functions to do that: coerce to a data.frame , a dplyr’s tbl or a R object .

Does it mean on the server, we will still have an R object with the symbol set by the client? Does it mean on meanDS may not need to be rewritten?

I did say I am a bit slow sometimes…

P.