Assigning missing attribute values based on rough sets theory

N. Cercone,Jiye Li Jiye Li

doi:10.1109/grc.2006.1635876

Abstract

We introduce a new approach RSFit on processing data with missing attribute values based on rough sets theory. By matching attribute-value pairs among the same core or reduct of the original data set, the assigned value preserves the characteristics of the original data set. We compare our approach with closest fit approach globally and closest fit approach concept. Experimental results on UCI data sets and a real geriatric care data set show our approach achieves comparable accuracy on assigning the missing values while significantly reduces the computation time. for attribute selection, rule discovery and many knowledge discovery applications in the areas such as data mining, machine learning and medical modeling. Core and reduct are among the most important concepts in this theory. A reduct contains a subset of condition attributes that are sufficient enough to represent the whole data set. The intersection of all the possible reduct is the core. Therefore by examining only attributes within the core or reduct for the matched or similar attribute-value pairs for the data instance containing the missing attribute values, we assign the most relevant value for the missing attribute. Experiments on UCI data sets and a geriatric care data set demonstrate our proposed approach on assigning missing attribute values can greatly reduce the computation time and at the same time maintain a satisfactory prediction accuracy.

Full Text