Recode levels to missing

I need to recode the values of a variable that contains

do not know no yes prefer not to say

into

do not know → NA no → 0 yes → 1 prefer not to say → NA

can anyone tell me how to do it?

ds.recodeValues does not allow having two values for missings - also it doesn’t work when I write

ds.recodeValues(“phenotypes$Diabetes.diagnosed.by.doctor”, values2replace.vector = c(“No”, “Yes”), new.values.vector = c(“No”, “Yes”), missing = c(“Do not know”), newobj = “a”) Assigned expr. (a ← recodeValuesDS(“phenotypes$Diabetes.diagnosed.by.doctor”, “No,Yes”, ) [] …Error: There are some DataSHIELD errors, list them with datashield.errors() datashield.errors() $cohort1 [1] "[Client error: (400) Bad Request] Lexical error at line 2, column 18. Encountered: " " (32), after : “\“Do””

$cohort2 [1] "[Client error: (400) Bad Request] Lexical error at line 2, column 18. Encountered: " " (32), after : “\“Do””

$cohort3 [1] "[Client error: (400) Bad Request] Lexical error at line 2, column 18. Encountered: " " (32), after : “\“Do””

I guess the problem is that the categories have an space

Do not know

Is it possible to do such recodification usind dsBaseClient?

Hi Juan,

In the “missing” argument you specify how you want to recode NAs with something else, so this is the reason that you can have only one value for that.

I think you can achieve what you want with something like this:

ds.recodeValues(“phenotypes$Diabetes.diagnosed.by.doctor”, values2replace.vector = c(“No”, “Yes”, “Do not know”, “prefer not to say”), new.values.vector = c(0, 1, NA, NA), newobj = “a”)

Can you try that and let me know if it works?

I get this error

ds.recodeValues(“phenotypes$Diabetes.diagnosed.by.doctor”, values2replace.vector = c(“No”, “Yes”, “Do not know”, “prefer not to say”), new.values.vector = c(0, 1, NA, NA), newobj = “a2”)

Assigned expr. (a2 ← recodeValuesDS(“phenotypes$Diabetes.diagnosed.by.doctor”, "No,Yes,Do not …Error: There are some DataSHIELD errors, list them with datashield.errors()

datashield.errors() $cohort1 [1] "[Client error: (400) Bad Request] Lexical error at line 1, column 69. Encountered: " " (32), after : “\“No,Yes,Do””

$cohort2 [1] "[Client error: (400) Bad Request] Lexical error at line 1, column 69. Encountered: " " (32), after : “\“No,Yes,Do””

$cohort3 [1] "[Client error: (400) Bad Request] Lexical error at line 1, column 69. Encountered: " " (32), after : “\“No,Yes,Do””

Maybe the problem is to have an space in the level of the variable (Do not …)

Yes the parser blocks spaces in character strings.

I checked if we can convert a character factor to numeric before recoding its levels, but my trials failed. I also tried to use the ds.Boole to create dummy variables but it is also not recognising character levels in the “V2” argument…

I removed spaces from the factor levels to avoid that problem and I get an error (once again!)

ds.recodeValues(“phenotypes$Diabetes.diagnosed.by.doctor”,

  •             c("Do.not.know", "No", "Prefer.not.to.answer", "Yes", "NA"),
    
  •             c(NA, 0, NA, 1, NA), newobj = "diab")
    
    Assigned expr. (diab ← recodeValuesDS(“phenotypes$Diabetes.diagnosed.by.doctor”, "Do.not.know,… Error: There are some DataSHIELD errors, list them with datashield.errors()

datashield.errors() $cohort1 [1] “Command ‘recodeValuesDS(“phenotypes$Diabetes.diagnosed.by.doctor”, “Do.not.know,No,Prefer.not.to.answer,Yes,NA”, \n “NA,0,NA,1,NA”, NULL)’ failed on ‘cohort1’: Error while evaluating ‘is.null(base::assign(‘diab’, value={dsBase::recodeValuesDS(“phenotypes$Diabetes.diagnosed.by.doctor”, “Do.not.know,No,Prefer.not.to.answer,Yes,NA”, “NA,0,NA,1,NA”, NULL)}))’ → Error : Error: var.name.text argument too long (see nfilter.stringShort)\n”

also

ds.recodeValues(“phenotypes$Diabetes.diagnosed.by.doctor”,

  •             c("No", "Yes"),
    
  •             c(0, 1), missing = c("Do.not.know", "Prefer.not.to.answer"),
    
  •             newobj = "diab")
    
    Assigned expr. (diab ← recodeValuesDS(“phenotypes$Diabetes.diagnosed.by.doctor”, “No,Yes”, ) [… Error: There are some DataSHIELD errors, list them with datashield.errors()

datashield.errors() $cohort1 [1] “Command ‘recodeValuesDS(“phenotypes$Diabetes.diagnosed.by.doctor”, “No,Yes”, \n “0,1”, c(“Do.not.know”, “Prefer.not.to.answer”))’ failed on ‘cohort1’: Error while evaluating ‘is.null(base::assign(‘diab’, value={dsBase::recodeValuesDS(“phenotypes$Diabetes.diagnosed.by.doctor”, “No,Yes”, “0,1”, base::c(“Do.not.know”, “Prefer.not.to.answer”))}))’ → Error : Error: var.name.text argument too long (see nfilter.stringShort)\n”

How can I address this issue ---- this is a trivial problem that can be required in many situations, mainly when using data as “resources”

I guess using the nfilter.string here dsBase/recodeValuesDS.R at master · datashield/dsBase · GitHub rather than the nfilter.stringShort would solve the issue. Is there any particular reason why the short filter is used on this function?

Yes as Xavier said the function uses the nfilter.stringShort which is by default set to 20 characters. It uses that filter in three checks: In the variable’s name, in the values2replace vector and in the new.values vector. So if you first rename the variable to a shorter name like this:

ds.make(“phenotypes$Diabetes.diagnosed.by.doctor”, newobj=“diab”)

then you can use the recodeValues function in two parts to avoid having a vector of values2replace with more than 20 characters including the dots.

ds.recodeValues(“diab”, values2replace.vector = c(“No”, “Yes”, “Do.not.know”), new.values.vector = c(0, 1, NA), newobj = “diab”) # note if newobj has the same name as the input variable it will overwritte the existing one

ds.recodeValues(“diab”, values2replace.vector = c(“Prefer.not.to.answer”), new.values.vector = c(NA), newobj = “diab”)

However, I am trying to modify the ds.asNumeric function to be able to convert character variables or factors with character levels to numerics and then you can do the recoding without any problems. In that case you can also have spaces in the character strings as the ds.asNumeric will not require to send any of those strings from the client to the server. I will send a pull request with this change by the end of this week.

Also, I am thinking to completely remove the nfilter.stringShort checks from the ds.recodeValues function because there are none disclosure issues with vectors of longer characters and anyway you can overtake the checks by the example I gave you above.