Abstract

Multiple imputation (MI) has been widely used for handling missing data in biomedical research. In the presence of high-dimensional data, regularized regression has been used as a natural strategy for building imputation models, but limited research has been conducted for handling general missing data patterns where multiple variables have missing values. Using the idea of multiple imputation by chained equations (MICE), we investigate two approaches of using regularized regression to impute missing values of high-dimensional data that can handle general missing data patterns. We compare our MICE methods with several existing imputation methods in simulation studies. Our simulation results demonstrate the superiority of the proposed MICE approach based on an indirect use of regularized regression in terms of bias. We further illustrate the proposed methods using two data examples.

Highlights

  • Advances in technologies have led to collection of high-dimensional data such as omics data in many biomedical studies where the number of variables is very large and missing data are often present

  • We investigate two approaches for multiple imputation for general missing data patterns in the presence of high-dimensional data

  • Our numerical results demonstrate that the multiple imputation by chained equations (MICE)-IURR approach performs better than the other imputation methods considered in terms of bias, whereas the MICE-DURR approach exhibits large bias and mean square error (MSE)

Read more

Summary

Introduction

Advances in technologies have led to collection of high-dimensional data such as omics data in many biomedical studies where the number of variables is very large and missing data are often present. Apart from random forest and KNN, regularized regression, which allows for simultaneous parameter estimation and variable selection, presents another option for building imputation models in the presence of high-dimensional data. There has been limited work on MI methods for general missing data patterns where multiple variables have missing values in the presence of high-dimensional data. To handle general missing data patterns, there are two MI approaches, one based on joint modeling (JM)[24] and the other based on fully conditional specifications, the latter of which is known as multiple imputation by chained equations (MICE) and has been implemented independently by van Buuren et al (2011)[3] and Raghunathan et al (1996)[25]. We focus on extending MICE to high-dimensional data settings for handling general missing data patterns

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.