Set up for regression (variables in different tables)

hi guys,

I’m trying to run a simple regression model using variables for just one country (later I’ll add more countries), but I 'm not sure how to set up the bunch of variables that I’ve got (Should I create a “consolidated” table/dataframe first?). The thing is that:

1.- I’ve got 3 tables with different dimensions. All three shared a common identifier. Right now, I am connecting to each table as if each of them were coming from different sources (Recall: these three tables are provided by the same source). Is that ok, or they should be provided already “merged”?

2.- If it’s me who should do the merge, how should I proceed? I’m trying to use ds.merged but I’m a bit confused. What is the difference between a connection, a dataframe, and a table?

3.- Finally, given some regressors are not provided( i.e. I’ve got height and weight but I haven’t got BMI), I create objects sub-setting and assigning. Some objects are created using, let’s say, table 1 (and connection 1) and others are created using table 2 (and connection 2). How can I make them available for running regressions that include variables provided (in different tables), and objects created using individual tables (and connections)?

I would really appreciate if you could shed some light on this issue. Many thanks in advance,


Hi Nicolai,

I would merge the data in Opal by creating a view that uses your 3 tables as the source. The view will join them together. Then connect to the harmonised view from your client. The view creation screen looks like this:

I wouldn’t do the merge in DataSHIELD (but others might have different views!).