Abstract

The performance of classification algorithms is highly dependent on the quality of training data. Missing attribute values are quite common in many real world applications, thus, in such cases, a complementary method to improve the quality of the data and, consequently, promote enhancements of the classifier performance, is necessary. To deal with this problem, two strategies are commonly employed in practice, 1) multiple imputation, which often maintains the statistical properties of the original data and, usually, has good performance, at the expense of high computational costs; 2) single imputation, which, in general, provides a suitable solution for data sets with a few missing attribute values, but hardly achieve good results when the number of missing values is high. This paper proposes a new single imputation method which uses Attribute-based Decision Graphs (AbDG) to estimate the missing values. AbDGs are a new type of data graphs which embed the information contained in the training set into a graph structure, built over pre-defined intervals of values from different attributes. As a consequence, similar data instances induce similar subgraphs when projected onto the AbDG, resulting in distinct patterns of connections. The main contribution of the paper is the proposal of a well-defined procedure to perform imputation, by partially matching instances with missing values against the AbDG. The proposed imputation method can effectively deal with data sets having high rates of missing attribute values while presenting low computational cost; a significant result towards the development of robust expert and intelligent systems. The obtained results show evidences that the proposed method is sound and promote qualitative imputation for classification purposes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call