An Evolutionary Missing Data Imputation Method for Pattern Classification

Fábio M.F Lobato,Vincent W Tadaiesky,Ádamo L De Santana,Igor M Araújo

doi:10.1145/2739482.2768451

Abstract

Data analysis plays an important role in our Information Era; however, most of statistical and machine learning algorithms were not developed to tackle the ubiquitous issue of missing values. In pattern classification, several strategies have been proposed to handle this problem, where missing data imputation is the most used one, which can be viewed as an optimization problem where the goal is to reduce the bias imposed by the absence of information. Although most imputation methods are restricted to one type of variable only (categorical or numerical), they usually ignore information within incomplete instances. To fill these gaps, we propose an evolutionary missing data imputation method for pattern classification, based on a genetic algorithm, which is suitable for mixed-attribute datasets and takes into account information from incomplete instances and model building -- more specifically, the classification accuracy. To assess the performance of our method, we used three algorithms in order to represent the three groups of classification methods: 1) rule induction learning, 2) approximate models and 3) lazy learning. Experiments have shown that the proposed method outperforms some well-established missing value treatment methods.

Full Text