Abstract

Multiple imputation by chained equations (MICE) is the most common method for imputing missing data. In the MICE algorithm, imputation can be performed using a variety of parametric and nonparametric methods. The default setting in the implementation of MICE is for imputation models to include variables as linear terms only with no interactions, but omission of interaction terms may lead to biased results. It is investigated, using simulated and real datasets, whether recursive partitioning creates appropriate variability between imputations and unbiased parameter estimates with appropriate confidence intervals. We compared four multiple imputation (MI) methods on a real and a simulated dataset. MI methods included using predictive mean matching with an interaction term in the imputation model in MICE (MICE-interaction), classification and regression tree (CART) for specifying the imputation model in MICE (MICE-CART), the implementation of random forest (RF) in MICE (MICE-RF), and MICE-Stratified method. We first selected secondary data and devised an experimental design that consisted of 40 scenarios (2 × 5 × 4), which differed by the rate of simulated missing data (10%, 20%, 30%, 40%, and 50%), the missing mechanism (MAR and MCAR), and imputation method (MICE-Interaction, MICE-CART, MICE-RF, and MICE-Stratified). First, we randomly drew 700 observations with replacement 300 times, and then the missing data were created. The evaluation was based on raw bias (RB) as well as five other measurements that were averaged over the repetitions. Next, in a simulation study, we generated data 1000 times with a sample size of 700. Then, we created missing data for each dataset once. For all scenarios, the same criteria were used as for real data to evaluate the performance of methods in the simulation study. It is concluded that, when there is an interaction effect between a dummy and a continuous predictor, substantial gains are possible by using recursive partitioning for imputation compared to parametric methods, and also, the MICE-Interaction method is always more efficient and convenient to preserve interaction effects than the other methods.

Highlights

  • MethodsSuppose that Xmiss is ordered in non-decreasing numbers of missing values in each column

  • We present only the results with respect to the interaction effects in the form of plots. e results of the imputation methods under the missing mechanism of missing completely at random (MCAR) and missing at random (MAR) are shown in Figures 1–6. ey are separated by different measurements

  • For all the coefficients which were in the interaction effect, at all missing percentages in both missing mechanisms, the Multiple imputation by chained equations (MICE)-Interaction method led to a negligible Percent Bias (PB) (PB < 2.2%)and had relatively good performance (Table 2)

Read more

Summary

Methods

Suppose that Xmiss is ordered in non-decreasing numbers of missing values in each column. (1) To fill in the initial values for the missing values, define a matrix Z equal to Xobs; for each Xmj iss, all the Xmj iss values are initially filled in by random draws from the predictive distribution conditional on Z, and attach the imputed version of Xmj iss to Z prior to incrementing j. K, replace the missing values of Xj with random draws from the predictive distribution conditional on Xm−jiss. (4) Repeat steps 1–3 a number of times (M), resulting in M imputed datasets that are available for analysis. It is standard to use generalized linear models as the basis of the posterior predictive distribution draws in steps 1 and 2

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call