Robust Bayesian Classification with Incomplete Data

Xunan Zhang,Cheng Wu,Shiji Song

doi:10.1007/s12559-012-9188-6

Abstract

In this paper, we address the Bayesian classification with incomplete data. The common approach in the literature is to simply ignore the samples with missing values or impute missing values before classification. However, these methods are not effective when a large portion of the data have missing values and the acquisition of samples is expensive. Motivated by these limitations, the expectation maximization algorithm for learning a multivariate Gaussian mixture model and a multiple kernel density estimator based on the propensity scores are proposed to avoid listwise deletion (LD) or mean imputation (MI) for solving classification tasks with incomplete data. We illustrate the effectiveness of our proposed algorithms on some artificial and benchmark UCI data sets by comparing with LD and MI methods. We also apply these algorithms to solve the practical classification tasks on the lithology identification of hydrothermal minerals and license plate character recognition. The experimental results demonstrate their good performance with high classification accuracies.

Full Text