Hi all,
I want to warn you about an issue with the ds.glm function that you might have if you include pre-processed factors as covariates. If a factor covariate has a different reference group in some studies (or if the levels of the factor are not in the same order across all studies) then the ds.glm assumes that all factors have the same reference group across the studies and in the output of the function you get an estimate with a label indicating the reference group of the first study included in the analysis.
In the example below you can see that the regression of a specific model applied in data from 3 studies gives an estimate of gender of -0.1071360 (which should be the estimate of gender=1 compared to the reference group which is gender=0). However if we run the regression in each study separately, we can see from the output of ds.glm that in study 2 the reference group of factor gender is the level gender=1. We can also check that from the output of ds.levels as the order of the levels of gender are not the same across the 3 studies.
So to make sure that the regression results for pooled regression (i.e. ds.glm) are correct, please first check the order of the levels of factor covariates (using the ds.levels functions). If you don’t have the same order of levels then you can use the ds.changeRefGroup function to specify the same reference group of a factor across all studies and then run the ds.glm which in that case will return the correct results (the correct estimate for gender is -0.4425188).
I am looking to add a check in the ds.glm function to return a warning message to the users in such cases. In the meantime, if you have any questions about this please contact me.
Thanks, Demetris
> ds.glm(formula = 'diabetes~bmi+gender', family = 'binomial', datasources = connections)
Aggregated (glmDS1(diabetes ~ bmi + gender, "binomial", NULL, NULL, NULL)) [===========] 100% / 1s
Iteration 1...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "0,0,0", NULL, NULL, ) [=======] 100% / 1s
CURRENT DEVIANCE: 12375.4497617173
Iteration 2...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "-2.17504842265572,0.00871150851295946,...
CURRENT DEVIANCE: 2915.6575776071
Iteration 3...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "-3.66029698749236,0.0263127570728691,-...
CURRENT DEVIANCE: 1690.39475563765
Iteration 4...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "-5.39969079376931,0.0626465593519114,-...
CURRENT DEVIANCE: 1395.63400256565
Iteration 5...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "-7.18032222933903,0.110785116443226,-0...
CURRENT DEVIANCE: 1338.4226273586
Iteration 6...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "-8.05336242853471,0.135793921983864,-0...
CURRENT DEVIANCE: 1333.0006603598
Iteration 7...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "-8.16398000198871,0.138884396458108,-0...
CURRENT DEVIANCE: 1332.92722345965
Iteration 8...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "-8.1656229295076,0.138929750884362,-0....
CURRENT DEVIANCE: 1332.92720698603
Iteration 9...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "-8.16562330293191,0.138929761172965,-0...
CURRENT DEVIANCE: 1332.92720698603
SUMMARY OF MODEL STATE after iteration 9
Current deviance 1332.92720698603 on 8924 degrees of freedom
Convergence criterion TRUE (5.11708255281577e-16)
beta: -8.16562330293191 0.138929761172964 -0.107136049184977
Information matrix overall:
(Intercept) bmi gender1
(Intercept) 131.51623 4045.766 60.57624
bmi 4045.76620 128005.832 1847.21071
gender1 60.57624 1847.211 60.57624
Score vector overall:
[,1]
(Intercept) -3.541611e-12
bmi -1.109370e-10
gender1 -1.627143e-12
Current deviance: 1332.92720698603
$Nvalid
[1] 8927
$Nmissing
[1] 452
$Ntotal
[1] 9379
$disclosure.risk
RISK OF DISCLOSURE
study1 0
study2 0
study3 0
$errorMessage
ERROR MESSAGES
study1 "No errors"
study2 "No errors"
study3 "No errors"
$nsubs
[1] 8927
$iter
[1] 9
$family
Family: binomial
Link function: logit
$formula
[1] "diabetes ~ bmi + gender"
$coefficients
Estimate Std. Error z-value p-value low0.95CI.LP high0.95CI.LP P_OR
(Intercept) -8.1656233 0.53425481 -15.2841362 9.754780e-53 -9.2127435 -7.1185031 0.0002841786
bmi 0.1389298 0.01680753 8.2659218 1.386194e-16 0.1059876 0.1718719 1.1490433897
gender1 -0.1071360 0.17514150 -0.6117114 5.407287e-01 -0.4504071 0.2361350 0.8984034376
low0.95CI.P_OR high0.95CI.P_OR
(Intercept) 9.975003e-05 0.0008093228
bmi 1.111808e+00 1.1875257282
gender1 6.373686e-01 1.2663452237
$dev
[1] 1332.927
$df
[1] 8924
$output.information
[1] "SEE TOP OF OUTPUT FOR INFORMATION ON MISSING DATA AND ERROR MESSAGES"
>
> ds.glm(formula = 'diabetes~bmi+gender', family = 'binomial', datasources = connections[1])
Aggregated (glmDS1(diabetes ~ bmi + gender, "binomial", NULL, NULL, NULL)) [===========] 100% / 0s
Iteration 1...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "0,0,0", NULL, NULL, ) [=======] 100% / 0s
CURRENT DEVIANCE: 2864.08415007369
Iteration 2...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "-2.13202979794411,0.00744868082955048,...
CURRENT DEVIANCE: 664.091745207632
Iteration 3...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "-3.5312037887809,0.0225403720954142,-0...
CURRENT DEVIANCE: 376.077873636936
Iteration 4...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "-5.09595491819132,0.0538611685523527,-...
CURRENT DEVIANCE: 305.060218021482
Iteration 5...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "-6.64316058862581,0.0954630406913807,-...
CURRENT DEVIANCE: 290.53336981469
Iteration 6...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "-7.37241744003646,0.116488672050102,-0...
CURRENT DEVIANCE: 289.008148333213
Iteration 7...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "-7.46062108286853,0.118970774445128,-0...
CURRENT DEVIANCE: 288.981069589415
Iteration 8...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "-7.46199501499254,0.119009690846053,-0...
CURRENT DEVIANCE: 288.981056969819
Iteration 9...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "-7.46199544706323,0.119009703477003,-0...
CURRENT DEVIANCE: 288.981056969815
SUMMARY OF MODEL STATE after iteration 9
Current deviance 288.981056969815 on 2063 degrees of freedom
Convergence criterion TRUE (1.19947276616999e-14)
beta: -7.4619954470633 0.119009703477005 -0.609693778835752
Information matrix overall:
(Intercept) bmi gender1
(Intercept) 28.247764 869.6887 8.891516
bmi 869.688718 27696.7482 261.610627
gender1 8.891516 261.6106 8.891516
Score vector overall:
[,1]
(Intercept) -1.926846e-12
bmi -5.533748e-11
gender1 -1.711487e-12
Current deviance: 288.981056969815
$Nvalid
[1] 2066
$Nmissing
[1] 97
$Ntotal
[1] 2163
$disclosure.risk
RISK OF DISCLOSURE
study1 0
$errorMessage
ERROR MESSAGES
study1 "No errors"
$nsubs
[1] 2066
$iter
[1] 9
$family
Family: binomial
Link function: logit
$formula
[1] "diabetes ~ bmi + gender"
$coefficients
Estimate Std. Error z-value p-value low0.95CI.LP high0.95CI.LP P_OR
(Intercept) -7.4619954 1.07344198 -6.951466 3.615096e-12 -9.565903 -5.3580878 0.0005741788
bmi 0.1190097 0.03339485 3.563714 3.656438e-04 0.053557 0.1844624 1.1263808480
gender1 -0.6096938 0.41055755 -1.485039 1.375336e-01 -1.414372 0.1949842 0.5435172801
low0.95CI.P_OR high0.95CI.P_OR
(Intercept) 7.007299e-05 0.004687824
bmi 1.055017e+00 1.202571770
gender1 2.430783e-01 1.215291815
$dev
[1] 288.9811
$df
[1] 2063
$output.information
[1] "SEE TOP OF OUTPUT FOR INFORMATION ON MISSING DATA AND ERROR MESSAGES"
> ds.glm(formula = 'diabetes~bmi+gender', family = 'binomial', datasources = connections[2])
Aggregated (glmDS1(diabetes ~ bmi + gender, "binomial", NULL, NULL, NULL)) [===========] 100% / 0s
Iteration 1...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "0,0,0", NULL, NULL, ) [=======] 100% / 0s
CURRENT DEVIANCE: 4072.93283297024
Iteration 2...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "-2.24627544296761,0.0107040207472627,0...
CURRENT DEVIANCE: 956.077881764171
Iteration 3...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "-3.87786831037723,0.0323345801998211,0...
CURRENT DEVIANCE: 548.967284423667
Iteration 4...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "-5.92977626926902,0.0769132344707401,0...
CURRENT DEVIANCE: 446.162757054575
Iteration 5...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "-8.14674303562793,0.135297302043061,0....
CURRENT DEVIANCE: 422.584835929038
Iteration 6...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "-9.30614037435908,0.165912147648399,0....
CURRENT DEVIANCE: 419.664329404253
Iteration 7...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "-9.49962996098236,0.170750652776734,0....
CURRENT DEVIANCE: 419.59769692462
Iteration 8...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "-9.50467644612636,0.170871561506694,0....
CURRENT DEVIANCE: 419.597650514433
Iteration 9...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "-9.50467996467969,0.170871641631032,0....
CURRENT DEVIANCE: 419.597650514408
SUMMARY OF MODEL STATE after iteration 9
Current deviance 419.597650514408 on 2935 degrees of freedom
Convergence criterion TRUE (5.91868313099369e-14)
beta: -9.50467996468151 0.170871641631071 0.442407754385301
Information matrix overall:
(Intercept) bmi gender0
(Intercept) 42.34133 1340.5383 28.59491
bmi 1340.53828 43567.8350 922.63471
gender0 28.59491 922.6347 28.59491
Score vector overall:
[,1]
(Intercept) -1.115877e-11
bmi -3.013110e-10
gender0 -2.492645e-12
Current deviance: 419.597650514408
$Nvalid
[1] 2938
$Nmissing
[1] 150
$Ntotal
[1] 3088
$disclosure.risk
RISK OF DISCLOSURE
study2 0
$errorMessage
ERROR MESSAGES
study2 "No errors"
$nsubs
[1] 2938
$iter
[1] 9
$family
Family: binomial
Link function: logit
$formula
[1] "diabetes ~ bmi + gender"
$coefficients
Estimate Std. Error z-value p-value low0.95CI.LP high0.95CI.LP P_OR
(Intercept) -9.5046800 0.95799362 -9.921444 3.358620e-23 -11.3823130 -7.6270470 7.449679e-05
bmi 0.1708716 0.03023733 5.651016 1.595022e-08 0.1116076 0.2301357 1.186338e+00
gender0 0.4424078 0.33301190 1.328504 1.840116e-01 -0.2102836 1.0950991 1.556450e+00
low0.95CI.P_OR high0.95CI.P_OR
(Intercept) 1.139513e-05 0.00048686
bmi 1.118074e+00 1.25877084
gender0 8.103544e-01 2.98947889
$dev
[1] 419.5977
$df
[1] 2935
$output.information
[1] "SEE TOP OF OUTPUT FOR INFORMATION ON MISSING DATA AND ERROR MESSAGES"
> ds.glm(formula = 'diabetes~bmi+gender', family = 'binomial', datasources = connections[3])
Aggregated (glmDS1(diabetes ~ bmi + gender, "binomial", NULL, NULL, NULL)) [===========] 100% / 0s
Iteration 1...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "0,0,0", NULL, NULL, ) [=======] 100% / 0s
CURRENT DEVIANCE: 5438.43277867333
Iteration 2...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "-2.11966145035052,0.00706903494537585,...
CURRENT DEVIANCE: 1294.72069118682
Iteration 3...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "-3.49132828533595,0.021318854784575,-0...
CURRENT DEVIANCE: 762.985935101846
Iteration 4...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "-4.98842420098296,0.0506082609379765,-...
CURRENT DEVIANCE: 639.19675364723
Iteration 5...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "-6.42974865101814,0.0893235660463956,-...
CURRENT DEVIANCE: 617.463312221223
Iteration 6...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "-7.11163311520548,0.109571838249463,-0...
CURRENT DEVIANCE: 615.686548946395
Iteration 7...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "-7.19078931189972,0.111919779017506,-0...
CURRENT DEVIANCE: 615.666756661298
Iteration 8...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "-7.19171463936573,0.111947040802439,-0...
CURRENT DEVIANCE: 615.666753696161
SUMMARY OF MODEL STATE after iteration 8
Current deviance 615.666753696161 on 3920 degrees of freedom
Convergence criterion TRUE (4.81535632481671e-09)
beta: -7.1917147736614 0.111947044776525 -0.362363530796833
Information matrix overall:
(Intercept) bmi gender1
(Intercept) 60.61566 1822.1393 22.64236
bmi 1822.13933 56229.3265 660.86093
gender1 22.64236 660.8609 22.64236
Score vector overall:
[,1]
(Intercept) -1.443043e-06
bmi -3.712178e-05
gender1 -9.584138e-07
Current deviance: 615.666753696161
$Nvalid
[1] 3923
$Nmissing
[1] 205
$Ntotal
[1] 4128
$disclosure.risk
RISK OF DISCLOSURE
study3 0
$errorMessage
ERROR MESSAGES
study3 "No errors"
$nsubs
[1] 3923
$iter
[1] 8
$family
Family: binomial
Link function: logit
$formula
[1] "diabetes ~ bmi + gender"
$coefficients
Estimate Std. Error z-value p-value low0.95CI.LP high0.95CI.LP P_OR
(Intercept) -7.1917148 0.82558874 -8.711014 3.011693e-18 -8.80983896 -5.5735906 0.0007522309
bmi 0.1119470 0.02646973 4.229247 2.344747e-05 0.06006732 0.1638268 1.1184536311
gender1 -0.3623635 0.26807060 -1.351747 1.764564e-01 -0.88777226 0.1630452 0.6960292938
low0.95CI.P_OR high0.95CI.P_OR
(Intercept) 0.000149235 0.003782462
bmi 1.061908032 1.178010230
gender1 0.411571607 1.177089889
$dev
[1] 615.6668
$df
[1] 3920
$output.information
[1] "SEE TOP OF OUTPUT FOR INFORMATION ON MISSING DATA AND ERROR MESSAGES"
>
> ds.levels(x = 'gender', datasources = connections)
Aggregated (exists("gender")) [========================================================] 100% / 3s
Aggregated (classDS("gender")) [=======================================================] 100% / 0s
Aggregated (levelsDS(gender)) [========================================================] 100% / 1s
$study1
$study1$Levels
[1] "0" "1"
$study1$ValidityMessage
[1] "VALID ANALYSIS"
$study2
$study2$Levels
[1] "1" "0"
$study2$ValidityMessage
[1] "VALID ANALYSIS"
$study3
$study3$Levels
[1] "0" "1"
$study3$ValidityMessage
[1] "VALID ANALYSIS"
>
> ds.changeRefGroup(x = 'gender', ref = 0, newobj = 'gender', datasources = connections)
Aggregated (exists("gender")) [========================================================] 100% / 1s
Aggregated (classDS("gender")) [=======================================================] 100% / 0s
Assigned expr. (gender <- changeRefGroupDS(gender,'0',FALSE)) [========================] 100% / 1s
Aggregated (exists("gender")) [========================================================] 100% / 1s
>
> ds.levels(x = 'gender', datasources = connections)
Aggregated (exists("gender")) [========================================================] 100% / 1s
Aggregated (classDS("gender")) [=======================================================] 100% / 0s
Aggregated (levelsDS(gender)) [========================================================] 100% / 1s
$study1
$study1$Levels
[1] "0" "1"
$study1$ValidityMessage
[1] "VALID ANALYSIS"
$study2
$study2$Levels
[1] "0" "1"
$study2$ValidityMessage
[1] "VALID ANALYSIS"
$study3
$study3$Levels
[1] "0" "1"
$study3$ValidityMessage
[1] "VALID ANALYSIS"
>
> ds.glm(formula = 'diabetes~bmi+gender', family = 'binomial', datasources = connections)
Aggregated (glmDS1(diabetes ~ bmi + gender, "binomial", NULL, NULL, NULL)) [===========] 100% / 1s
Iteration 1...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "0,0,0", NULL, NULL, ) [=======] 100% / 1s
CURRENT DEVIANCE: 12375.4497617173
Iteration 2...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "-2.15622992309117,0.00835653658473421,...
CURRENT DEVIANCE: 2915.15452834427
Iteration 3...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "-3.60345579963896,0.0252352254499992,-...
CURRENT DEVIANCE: 1688.85182157007
Iteration 4...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "-5.26379007863147,0.0600362362113038,-...
CURRENT DEVIANCE: 1392.2543056008
Iteration 5...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "-6.93445026106209,0.105943099087113,-0...
CURRENT DEVIANCE: 1333.38210025201
Iteration 6...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "-7.74236578021945,0.129578089015349,-0...
CURRENT DEVIANCE: 1327.52015296092
Iteration 7...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "-7.84478123492401,0.132513026838178,-0...
CURRENT DEVIANCE: 1327.43126687773
Iteration 8...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "-7.84637183574817,0.132558462251436,-0...
CURRENT DEVIANCE: 1327.43124065026
Iteration 9...
Aggregated (glmDS2(diabetes ~ bmi + gender, "binomial", "-7.84637225890271,0.132558474471549,-0...
CURRENT DEVIANCE: 1327.43124065026
SUMMARY OF MODEL STATE after iteration 9
Current deviance 1327.43124065026 on 8924 degrees of freedom
Convergence criterion TRUE (2.05530688978913e-15)
beta: -7.84637225890274 0.132558474471551 -0.442518769668178
Information matrix overall:
(Intercept) bmi gender1
(Intercept) 131.3925 4041.377 45.296
bmi 4041.3775 127861.635 1342.211
gender1 45.2960 1342.211 45.296
Score vector overall:
[,1]
(Intercept) 1.191935e-12
bmi 4.318679e-11
gender1 -1.723066e-13
Current deviance: 1327.43124065026
$Nvalid
[1] 8927
$Nmissing
[1] 452
$Ntotal
[1] 9379
$disclosure.risk
RISK OF DISCLOSURE
study1 0
study2 0
study3 0
$errorMessage
ERROR MESSAGES
study1 "No errors"
study2 "No errors"
study3 "No errors"
$nsubs
[1] 8927
$iter
[1] 9
$family
Family: binomial
Link function: logit
$formula
[1] "diabetes ~ bmi + gender"
$coefficients
Estimate Std. Error z-value p-value low0.95CI.LP high0.95CI.LP P_OR
(Intercept) -7.8463723 0.54307740 -14.447982 2.581403e-47 -8.91078440 -6.78196011 0.0003910155
bmi 0.1325585 0.01697825 7.807546 5.831236e-15 0.09928171 0.16583524 1.1417457771
gender1 -0.4425188 0.18585796 -2.380951 1.726799e-02 -0.80679368 -0.07824386 0.6424162829
low0.95CI.P_OR high0.95CI.P_OR
(Intercept) 0.0001349078 0.001132765
bmi 1.1043773730 1.180378602
gender1 0.4462867140 0.924738890
$dev
[1] 1327.431
$df
[1] 8924
$output.information
[1] "SEE TOP OF OUTPUT FOR INFORMATION ON MISSING DATA AND ERROR MESSAGES"