How to compare variables of different tables

Hi all! I’m starting to use datashield now (wish me luck!). I have a basic doubt. I intend to compare two variables of two different tables. I intend to confirm if I have the same ids on the different tables. I’m using the following path:

ds.assign(toAssign=‘dataSex$child_id’, newobj=‘sex_childID’, datasources = conns) ds.assign(toAssign=‘dataAge$child_id’, newobj=‘age_childID’, datasources = conns) ds.Boole(V1=“sex_childID”, V2=“age_childID”, Boolean.operator = “==” , newobj = “comparisonID”, datasources = conns) ds.table(“comparisonID”)

But I’m obtaining this error: “[Client error: (400) Bad Request]”, and I can’t understand what can be wrong. Any recommendations? Many thanks!

Hi,

Nothing immediately jumps out to me. Would it be possible to work out which of the 4 methods is generating the error message?

Stuart

Hi Stuart,

When I run the method ‘ds.Boole(V1=“sex_childID”, V2=“age_childID”, Boolean.operator = “==” , newobj = “comparison”, datasources = conns)’, I obtain this: $is.object.created [1] “A data object has been created in all specified data sources”

$validity.check [1] “ invalid in at least one source. See studyside.messages:”

$studyside.messages $studyside.messages$NINFEA [1] “NOT ALL OK: there are studysideMessage(s) on this datasource”

And then I run the ds.table and obtain the error “[Client error: (400) Bad Request]”

Thanks!

Just before you run the dsBoole, could you run:

ds.ls(datasources = conns)

To ensure the servers all have the tables expected.

Stuart

Hi Marta,

The reason that you get this error is because ds.Boole works with numeric, factor or logical variables but not with characters and child_id is a character variable.

Even if there is a way to convert a character to a numeric variable, I don’t suggest you to use the ds.Boole to confirm if the two dataframes have the same ids, because the ds.Boole function compares two vectors row-wise and therefore you need to be sure that the ids in the two vectors are in the same order.

An alternative way to see how many unique ids are shared between the two dataframes is to merge the two dataframes by child_id using the ds.merge function and using the options all.x=FALSE and all.y=FALSE. In this case the function will create a merged dataframe that will include the rows for which child_ids exist in both dataframes. Then you can compare the dimensions of dataSex, dataAge and the merged dataframe using the ds.dim function, and if the number of rows is the same in the three dataframes you confirm that dataSex and dataAge had the same ids.