Abstract

Missing data is relatively common in all type of research, which can reduce the statistical power and have biased results if not handled properly. Multivariate Imputation by Chained Equations (MICE) has emerged as one of the principled method of addressing missing data. This paper provides comparison of MICE using various methods to deal with missing values. The chained equations approach is very flexible and can handle various types of data such as continuous or binary as well as various missing data patterns. Objectives: To discuss commonly used techniques for handling missing data and common issues that could arise when these techniques are used. In particular, we will focus on different approaches of one of the most popular methods, Multiple Imputation using Chained Equations (MICE). Methods/Statistical Analysis: Multivariate Imputation by Chained Equation is a statistical method for addressing missing value imputation. The paper will focus on Multiple Imputation using Predictive Mean Matching, Multiple Random Forest Regression Imputation, Multiple Bayesian Regression Imputation, Multiple Linear Regression using Non-Bayesian Imputation, Multiple Classification and Regression Tree (CART), Multiple Linear Regression with Bootstrap Imputation which provides a general framework for analyzing data with missing values. Findings: We have chosen to explore Multiple Imputation using MICE through an examination of sample data set. Our analysis confirms that the power of Multiple Imputations lies in getting smaller standard errors and narrower confidence intervals. The smaller is the standard error and narrower is the confidence interval; the predicted value is more accurate, thus, minimizing the bias and inefficiency considerably. In our results from sample data set, it has been observed that standard error and mean confidence interval length is the least in case of Multiple Imputation combined with Bayesian Regression. Also, it is obvious from the density plot that the imputed values are more close to the observed values in this method than other methods. Even in case of random forest, the results are quite close to Bayesian Regression. Application/Improvements: These Multiple Imputation methods can further be combined with machine learning and Genetic Algorithms on real set data to further reduce the bias and inefficiency.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.