Dear all,
we would like to adjust for the site effect in GLM by adding a fixed effect per site. For this purpose, we have created binary variables, one for each site, which are 1 when the data are from this site and 0 otherwise. When we add these into the model in GLM, we receive an error. I assume it is due to the fact that the variable has the same value for all individuals from the same site.
Is there a way to adjust for site effects in GLM?
We are still using Version 4 as the project is a time critical phase.
Best wishes,
Daniela
Hi Daniela,
I have tried to adjust for site effect by generating the binary variables as you mentioned and is not failing. However I did it using version 5 of DataSHIELD. Can you send me the code and the error message you get to have a look?
Many thanks,
Demetris
Hi Demetris,
many thanks for your help! I just found the solution.
We tried:
ds.glm("D$bmi~D$cohort_1+D$cohort_2+D$activity",family="gaussian",maxit = 30)
Here, bmi is continuous, activity categorical. The variable cohort_1 is 1 at the first site and 0 at the other two sites, cohort_2 is 1 at the second site and 0 in the other two sites. The cohort variables are categorical variables.
The error we get is:
Error: Command 'glmDS1(D$bmi~D$cohort_1+D$cohort_2+D$activity, "gaussian", NULL)' failed on 'Site1': Error while evaluating 'dsModelling::glmDS1( D$bmi~D$cohort_1+D$cohort_2+D$activity,"gaussian",NULL )'
We have tried it with using cohort_3 instead of cohort_1 or with just one of the variables, the result was the same.
When we transform the site variable to numeric, the evlauation runs! Thus, the problem is that automatically creating dummies as it is normally done when putting in categoricals into GLM, does not function when only one category is present at one site.
Best wishes,
Daniela
1 Like
Hi Daniela,
Yes, I have done the same procedure but with creating “numeric” dummies. I assume that you either used the ds.make or the ds.assign function to create the cohort_1 and cohort_2 variables. Another function that could be helpful which is included in version 5 of DataSHIELD is the ds.Boole function which can be used for the creation of dummy variables and has the argument “numeric.output” which you can set to TRUE in order to create dummy variables of class numeric.
Also, please have a look on this topic Conversion of factors to numerics which might affect some of your analysis.
Best wishes,
Demetris