Following a discussion we had with @swheater and @PatRyserWelch, I would like to know your DS developer opinion regarding the possibility for Opal to assign table into a tibble instead of a plain-old data.frame. The tibble data structure is a “modern reimagining of the data.frame”, part of the Tidyverse project. It aims at being easier to work with and potentially faster. As an example, the package dplyr is a powerful library for manipulating tibbles (select, group, filter, join etc.).
Opal has been using the tibble format to push its tables to R for 2 years (tibble 1.2), except in a Datashield context… for legacy and backward compatibility reasons. Also one specifity of Opal’s data frames for Datashield is that the participant identifiers are set as the row names of the data frame instead of being a separate column. Tibble does not allow to set the row names, and in addition to that, this makes impossible to have a data frame with multiple rows per participant (which can be problematic when there are for instance several measures per individual).
The impact of switching to tibbles will be:
- the checkClass() functions assumes that the class is a single string, but in the case of a tibble the class names returned are “tbl” and “data.frame” (there can be even more). This breaks the checkClass() current implementation and the subsequent class comparisons (
%in%operator is to be used in place of the
- there will be a new column for the identifiers, which will appear on colnames() call.
Regarding checkClass(), it is not a big deal and I have already fixed it in the DSI branch of dsBase and dsBaseClient.
Regarding a column with the identifiers, it is up to you to decide whether it is a disclosive information, in which case it should be hidden from the client.
I will make a release of Opal next week, with a magic system setting that makes Opal assign tibbles for Datashield. Default behavior will still be to assign a data.frame, but that setting let’s you the opportunity to test the tibble option.
The decision to use tibble or not will have an impact on the other data repositories willing to integrate the Datashield platform (Molgenis, in a near future).