Comparison of five iterative imputation methods for multivariate classification

Yushan Liu,Steven D Brown

doi:10.1016/j.chemolab.2012.11.010

Abstract

Imputation methods are often used to fill the missing values in an incomplete data set before applying multivariate statistical methods. In this paper, five iterative imputation methods are compared. These include general iterative principal component imputation (GIP), singular value decomposition imputation (SVD), regularized expectation maximization with multiple ridge regression (r-EM), regularized expectation maximization with truncated total least squares (t-EM), and multiple imputation by chained equations (MICE). Two evaluation criteria (covariance change and classification error change) are determined to evaluate imputation performance on one simulated dataset and two published datasets. No single imputation method emerged as the overall best in all cases examined. The r-EM imputation method performs well when the missing proportion is under 20%, judging from results obtained from both real datasets examined. If the percentage of the missing data is above 20%, however, the purpose behind analysis of a dataset should be considered carefully before choosing an imputation method.

Full Text