WARNING: useNA='ifany' argument in ds.table function might create incorrect results

Hi everyone,

I would like to notify you for an issue with the useNA argument in the ds.table function. This argument is set by default to useNA=‘ifany’ and depending on the number of missings the input variable has in the first study (first in the order that the studies exist in the list of cconnections), it might causes the ds.table function to return incorrect tables. This happens when the first study doesn’t have any missing values but some other studies have.

Here is an example:

If I run the ds.table(‘nonrep$sex’) command in 8 studies and without specifying the useNA argument (then this is set by default to ‘ifany’) I get the following table:

          study
nonrep$sex    1     2     3   4    5    6    7    8
        1  3456 45817 49644 850 9318 1454 1099 7706
        2  3356 43724 47181 828 8811 1414 1037 7355
        NA  830 16504    NA  NA  200   NA  134  584

which is correct because by chance, study 1 has some missing values.

However, when my connection to the studies is in a different order or if I specify a specific order of the connections in the datasources argument, then with the ds.table(‘nonrep$sex’, datasources=connections[c(3,1:2,4:8)]) command, I get the following table:

          study
nonrep$sex     1    2     3   4    5    6    7    8
        1  49644  830 16504 850  200 1454  134  584
        2  47181 3356 43724 828 8811 1414 1037 7355
        NA    NA   NA    NA  NA   NA   NA   NA   NA

which is incorrect. For example as you can see, the table now indicates that study 2 (which was study 1 in the first table) has no missings and the number of males is 830 which is not the correct number.

To avoid having any issues with the ds.table function and to not worry about the order of the studies in your connections, I suggest you to specify the argument useNA either to ‘always’ or to ‘no’. Both options return the correct tables (see the tables below).

ds.table(‘nonrep$sex’, datasources=connections[c(3,1:2,4:8)], useNA=‘always’)

          study
nonrep$sex     1    2     3   4    5    6    7    8
        1  49644 3456 45817 850 9318 1454 1099 7706
        2  47181 3356 43724 828 8811 1414 1037 7355
        NA     0  830 16504   0  200    0  134  584

ds.table(‘nonrep$sex’, datasources=connections[c(3,1:2,4:8)], useNA=‘no’)

          study
nonrep$sex     1    2     3   4    5    6    7    8
         1 49644 3456 45817 850 9318 1454 1099 7706
         2 47181 3356 43724 828 8811 1414 1037 7355

In version 6.2 of dsBase/dsBaseClient, that we will release soon, we have changed the default option to “always” and the user can only choose either to use useNA=“always” or useNA=“no” (the useNA=‘ifany’ option will not be allowed).