Abstract

Attribute reduction techniques based on Pawlak rough set theory work only on data sets with discrete attributes. In real-world applications, the domain of a few or all attributes of the data set may be continuous. These continuous attributes need to be discretized as a pre-processing step to attribute reduction. In this paper, we have proposed an algorithm to the problem of attribute reduction on continuous data in rough set theory. The proposed algorithm does not need any extra information or expert domain knowledge apart from the continuous data set. The proposed algorithm is based on the concepts of rough set theory. These include principle of indiscernibility, basic cuts and discernibility matrix. It adapts the search techniques provided by the ant colony optimization meta-heuristic. As ant colony optimization is a graph based meta-heuristic algorithm, we have introduced a fully connected graph whose nodes are the basic cuts. We have evaluated the proposed algorithm on various data sets found in University of California, machine learning repository. For each data set, a reduced data set is obtained by retaining the attributes in the reduct determined by the proposed algorithm and removing the attributes not in the reduct. The obtained reduced data set is found to give better classification accuracies when tested using i) C4.5 classifier and ii) Naive Bayes classifier in comparison with those obtained on the data set before attribute reduction.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call