Set up for regression (variables in different tables)

Nicolai · 31 January 2021 18:36

hi guys,

I’m trying to run a simple regression model using variables for just one country (later I’ll add more countries), but I 'm not sure how to set up the bunch of variables that I’ve got (Should I create a “consolidated” table/dataframe first?). The thing is that:

1.- I’ve got 3 tables with different dimensions. All three shared a common identifier. Right now, I am connecting to each table as if each of them were coming from different sources (Recall: these three tables are provided by the same source). Is that ok, or they should be provided already “merged”?

2.- If it’s me who should do the merge, how should I proceed? I’m trying to use ds.merged but I’m a bit confused. What is the difference between a connection, a dataframe, and a table?

3.- Finally, given some regressors are not provided( i.e. I’ve got height and weight but I haven’t got BMI), I create objects sub-setting and assigning. Some objects are created using, let’s say, table 1 (and connection 1) and others are created using table 2 (and connection 2). How can I make them available for running regressions that include variables provided (in different tables), and objects created using individual tables (and connections)?

I would really appreciate if you could shed some light on this issue. Many thanks in advance,

Nicolai

tombishop · 1 February 2021 19:57

Hi Nicolai,

I would merge the data in Opal by creating a view that uses your 3 tables as the source. The view will join them together. Then connect to the harmonised view from your client. The view creation screen looks like this:

I wouldn’t do the merge in DataSHIELD (but others might have different views!).

Tom

Topic		Replies	Views
Linking opal tables for access through datashield Analyst Support	13	662	30 January 2020
DataSHIELD analysis with relational database Analyst Support	8	535	10 August 2020
Error: No such value table in data in Data Source Analyst Support	4	523	18 May 2020
Data.frame vs. tibble Developer support	7	1317	9 July 2019
Error running ds.glmSummary Analyst Support	4	209	29 March 2023

Set up for regression (variables in different tables)

Related topics