Abstract
BackgroundThis study was aimed to evaluate five Multiple Imputation (MI) methods in the context of STEP-wise Approach to Surveillance (STEPS) surveys.MethodsWe selected a complete subsample of STEPS survey data set and devised an experimental design consisted of 45 states (3 × 3 × 5), which differed by rate of simulated missing data, variable transformation, and MI method. In each state, the process of simulation of missing data and then MI were repeated 50 times. Evaluation was based on Relative Bias (RB) as well as five other measurements that were averaged over 50 repetitions.ResultsIn estimation of mean, Predictive Mean Matching (PMM) and Multiple Imputation by Chained Equation (MICE) could compensate for the nonresponse bias. Ln and Box–Cox (BC) transformation should be applied when the nonresponse rate reaches 40% and 60%, respectively. In estimation of proportion, PMM, MICE, bootstrap expectation maximization algorithm (BEM), and linear regression accompanied by BC transformation could correct for the nonresponse bias. Our findings show that even with 60% of nonresponse rate some of the MI methods could satisfactorily result in estimates with negligible RB.ConclusionDecision on MI method and variable transformation should be taken with caution. It is not possible to regard one method as totally the worst or the best and each method could outperform the others if it is applied in its right situation. Even in a certain situation, one method could be the best in terms of validity but the other method could be the best in terms of precision.
Highlights
In many countries, noncommunicable disease risk factors survey, known as STEP-wise Approach to Surveillance (STEPS), is regarded as the main source of information on prevalence of type 2 diabetes
Experimental design of this study consisted of 45 states (3 × 3 × 5), which differed by rate of simulated missing data (20%, 40%, and 60%), variable transformation [no transformation, natural logarithm (Ln), and Box–Cox (BC)], and Multiple Imputation (MI) method (BEM, Multiple Imputation by Chained Equation (MICE), Multivariate Normal Regression (MVN), Linear Regression (LR), and Predictive Mean Matching (PMM))
Regarding Relative Efficiency (RE), Fraction of Missing Information (FMI), and Relative Variance Increased (RVI), the best performance belongs to PMM supplemented by BC transformation
Summary
Noncommunicable disease risk factors survey, known as STEP-wise Approach to Surveillance (STEPS), is regarded as the main source of information on prevalence of type 2 diabetes. Biochemical variables including variable Fasting Blood Glucose (FBG) will contain some amount of missing data (on average about 19% in Iran’s STEPS surveys), which can damage the estimates both in terms of precision and validity. There are three mechanisms under which missing data are produced: first, Missing Completely at Random (MCAR), which produces missing data independent from values in other variables as well as the missing value itself. This kind of missing data reduce the sample size but do not bias the estimates. Missing not at Random (MNAR) in which being missing depends on its value after adjusting for other variables. This study was aimed to evaluate five Multiple Imputation (MI) methods in the context of STEP-wise Approach to Surveillance (STEPS) surveys
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have