Imputation of Missing Data for a Continuous Variable with an Ordinal form of Risk Function: When to Apply the Transformation?

Saiedeh Haji-Maghsoudi,Mohammad Baneshi,Behshid Garrusi

doi:10.6000/1929-6029.2014.03.04.6

Saiedeh Haji-Maghsoudi, Mohammad Baneshi + Show 1 more

Open Access

https://doi.org/10.6000/1929-6029.2014.03.04.6

Copy DOI

Abstract

Introduction: Imputation of missing data and selection of appropriate risk function are of importance . Sometimes a variable with continuous nature will be offered to the regression model as an ordinal variable. Our aim is to investigate whether to offer the continuous form of the variable to the imputation phase and its ordinal from to the modeling phase, or whether to offer the ordinal version to both phases. Material and Methods: The outcome and main variable of interest was use of diet as a body change approach, and Body Mass Index (BMI). We randomly deleted 10%, 20%, and 40% of BMI values. In strategies 1 and 2, BMI was offered to the imputation phase as a continuous (BMIC) and ordinal variable (BMIO). Missing data were imputed using linear and polytomous regression respectively. In strategy 1, after imputation, BMIC was categorized (named BMICO) and offered to the modeling phase. In strategy 2, after imputation of BMIO values, this variable was offered to the logistic model (named BMIOO). We compared two strategies at Event Per Variables (EPV) of 75, 10, and 5. Result: At EPVs of 75 and 10 no remarkable difference was seen. However, at EPV of 5, strategy 2 was superior. At 20% and 40% missing rates, strategy 1 was 2.21 and 3.67 times more likely to produce Severe Relative Bias. At high missing rate, power was higher in strategy2 (90% versus 83%). Conclusions: When EPV is low and missing rate is high, categorizing of variable before imputation of missing data produces less SRB and leads to higher power.

Full Text