Abstract

Missing attribute values is a recurrent problem in data mining and machine learning. Although there are plenty of techniques to handle this problem, most of them are too simplistic to provide a good estimation for absent attribute values. A very active research area focuses on solving the missing attribute value problem via imputation methods, which replaces missing data with substituted values. This paper proposes a new imputation method which uses a special graph named Complete p-Partite Attribute-based Decision Graphs (CpP-AbDG) to estimate, in a consistent and plausible way, the missing values. The graph is built by considering the range of each attribute that describes the data divided into sub-intervals; sub-intervals are approached as the vertices of a graph. Edges are then established between pairs of different vertices, provided they do not related to the same attribute. The edges and vertices are finally assigned a weight, based on distributions of the classes. The resulting CpP-AbDG has shown to be a suitable and informative data structure for finding the proper interval in which a missing attribute value should lie, taking into account all the attributes that describe the data. Results comparing the proposed approach to classical ones in an computational environment that considers classification problems as an evaluation criteria, show the potential of the method.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.