Create new variable using logical function

Hello,

I’d like to create a new binary variable that is derived from a continuous one, conditional on a third (also continuous) variable. Specifically, I’d like to create a variable sga that is derived from birth_weight : sga is small for gestational age, so the conditional variable is ga (gestational age of the subject).

I was initially trying with the “standard” datashield but then realised it was possible to install dsBetaTest on the node so was able to look at using some of the new fuctions - specifically, I tried using ds.make.o(). However, I still cannot figure it out.

The logic I am trying to implement is as follows:

  1. generated expected (bw50c) birth weight for the subject (of ga = X where X is a continuous variable from 22 to 40) using a formula based on the known median birth weight at 40 weeks gestation

  2. generate sga indicator variable as follows:

    • sga == 1 if birth_weight < (bw50c - 3*sd)
    • sga == 0 if birth_weight >= (bw50c - 3*sd)

    nb: sd is a known standard deviation.

Is anyone able to advise if this is possible using available (including beta) datashield functions and, if so, how I might solve this? I am sure I am probably just being a bit dumb and overlooking something…

Thanks in advance!

Andrei

Hi Andrei,

I have tried to do things like this before, and it seems tricky to me because all you have available to apply conditionals is the subset function, which splits your data up. So I have illustrated the roundabout steps that might get you there. It may be the case that I too have missed something that would make this much easier… or that there is something in the beta release that would help like ds.make… or I may have made an error in my code/logic. We are waiting for the beta to become a release before we deploy on our nodes.

Tom

#generate the expected BW, assuming it’s a simple transformation

ds.assign(toAssign="D$ga*5+4 ", newobj = "bw50c_temp")

#get 3 SDs

local_var = ds.var(x=’bw50c_temp’)

local_3sd = local_var^0.5*3

#create threshold vector

ds.assign(toAssign=paste0(" bw50c_temp-”,local_3sd), newobj = "bw50c_thresh")

#generate column of 0’s and 1’s to use later as the indicator. This is a short cut to generate a vector the same length as the data set D

ds.assign(toAssign="D$ga-D$ga ", newobj = "zeros")

ds.assign(toAssign="D$ga/D$ga ", newobj = "ones")

ds.cbind(x=c(‘D’,’zeros’,’ones’), newobj = ‘D2’)

#check the bw against the bw50c minus 3sd value

ds.assign(toAssign=" birth_weight-bw50c_thresh“, newobj = "new_thresh")

#create two subsets using the condition

ds.subset(x = ‘D2’, subset = ‘D3’, logicalOperator = ‘new_thresh >’, threshold = 0, datasources = opals)

ds.subset(x = ‘D2’, subset = ‘D4’, logicalOperator = ‘new_thresh <=’, threshold = 0, datasources = opals)

#now you need to glue together the columns you need from D3,D4, and remember to include new_thresh. You will probably have a lot of variables so #could write a loop

#and then combine into a data frame

ds.c(c(‘D3$ga’,’D4$ga),newobj=’new_ga’)

ds.c(c(‘D3$sex’,’D4$sex),newobj=’new_sex’)

#and most importantly your new indicator variable!!

ds.c(c(‘D3$ones’,’D4$zeros’), newobj = "indicator")

ds.dataframe(c(‘new_ga’,’new_sex’, ‘indicator’), new_obj=’final’)

Hi Andrei and Tom,

The new ds.Boole.o function makes everything easier. I think that you will be able to generate the binary variable using the following logic:

  1. Assign a new variable BW which is equal to bw50c-3sd (here replace sd with its actual value): ds.make.o(toAssign="bw50c-3sd", newobj=“BW”, datasources=xxxx)

  2. Use the ds.Boole.o function to create the binary variable: ds.Boole.o(V1=“birth_weight”, V2=“BW”, Boolean.operator="<", numeric.output=TRUE, newobj=“sga”, datasources=xxxx)

Remember to replace xxxx with the name of your datasources.

Hi Tom and Demetris,

Thanks for the rapid responses! I’m glad that it’s not just me that has had difficulty with this sort of thing, and delighted to see that there are potentially solutions.

@tombishop, we currently have a testing node rather than the full thing, so I’m pretty much the only one with access, which makes things a bit easier for now. However, I suspect we may also need to wait for the beta to become a release before it is widely deployed across all the nodes in our consortium - I’m looking forward to hearing about progress on that.

@demetris.avraam, you’re right: the ds.Boole.o function looks exactly what I’ve been looking for. I’m actually away next week and got a ton of other things to do before I leave, but when I get around to testing, I’ll report back!

Cheers,

Andrei

Hello all,

Thanks again for the advice - I’ve now had the chance to use the code that @demetris.avraam suggested and am delighted to say it worked really well. I now need to think more about how to use it if there are two levels of logic. To be continued…

1 Like