I’m trying to run a simple regression model using variables for just one country (later I’ll add more countries), but I 'm not sure how to set up the bunch of variables that I’ve got (Should I create a “consolidated” table/dataframe first?). The thing is that:
1.- I’ve got 3 tables with different dimensions. All three shared a common identifier. Right now, I am connecting to each table as if each of them were coming from different sources (Recall: these three tables are provided by the same source). Is that ok, or they should be provided already “merged”?
2.- If it’s me who should do the merge, how should I proceed? I’m trying to use ds.merged but I’m a bit confused. What is the difference between a connection, a dataframe, and a table?
3.- Finally, given some regressors are not provided( i.e. I’ve got height and weight but I haven’t got BMI), I create objects sub-setting and assigning. Some objects are created using, let’s say, table 1 (and connection 1) and others are created using table 2 (and connection 2). How can I make them available for running regressions that include variables provided (in different tables), and objects created using individual tables (and connections)?
I would really appreciate if you could shed some light on this issue. Many thanks in advance,