Hi @twey
There is not a specific function for model diagnostics in the current version of dsBase. I am developing a function for plot diagnostics which will be released in one of the next releases.
At the moment you can do some checks for the normality and homoscedasticity of residuals and for linearity, but I didn’t think for a way to detect outliers and influential points with the existing functionality.
Here is an example of some checks you can do:
modDS <- ds.glm(formula = 'D$LAB_TRIG~D$GENDER+D$PM_BMI_CONTINUOUS+D$LAB_HDL', family='gaussian')
modDS$coefficients
ds.asNumeric('D$GENDER', newobj = 'gender.n')
ds.table('D$GENDER')
ds.histogram('gender.n', type='combine')
# get the complete cases of the set of variables in the model
ds.dataFrame(c('D$LAB_TRIG','gender.n','D$PM_BMI_CONTINUOUS','D$LAB_HDL'), newobj='Data_regress')
ds.dim('Data_regress')
ds.completeCases(x1='Data_regress', newobj='Data_regress.compl')
ds.dim('Data_regress.compl')
# create fitted values and residuals
ds.make(toAssign = '1.70005820+((-0.59987524)*Data_regress.compl$gender.n)+(0.05791745*Data_regress.compl$PM_BMI_CONTINUOUS)+((-0.61425296)*Data_regress.compl$LAB_HDL)', newobj = 'fitted_values')
ds.make(toAssign = '(Data_regress.compl$LAB_TRIG-fitted_values)', newobj = 'residuals')
ds.ls()
# check normality of residuals
ds.mean('residuals', type = 'combine')
ds.histogram('residuals', type = 'combine')
# check for linearity and homoscedasticity
ds.scatterPlot(x='fitted_values', y='residuals', type='combine', datasources=connections)
#ds.var('residuals', type='combine')
#sqrt(2.026274)
#ds.make(toAssign = 'residuals/1.423473', newobj = 'std.residuals')
#ds.mean('std.residuals')
Also note that if you use the ds.glmSLMA instead of ds.glm then you can create the fitted values using the ds.glmPredict function instead of using the ds.make.
Another note is that the points shown in scatterplots (e.g. residuals vs fitted values) are anonymised points and not the actual values (you can find some information about how we do the anonymisation in the plots here: Privacy preserving data visualizations | EPJ Data Science | Full Text).
Also if you are interested on how to calculate other statistics from regressions like Wald test, likelihood ratio test, type I and II errors, etc., let me know and i can share examples for those too.
Many thanks,
Demetris