Abstract

Attribute reduction is one of the most important problems in rough set theory. Conventional attribute reduction algorithms are based on minimal errors in seen objects, namely empirical risk minimization. Classification ability in unseen objects, namely generalization ability is more important in actual applications. Therefore, a good reduct should have good generalization ability. Structural risk minimization (SRM) inductive principle is an effective tool to control the generalization ability of learning machines, which considers complexity and errors in seen objects simultaneously. Therefore, this paper introduces the SRM principle into the definition of attribute significance, proposes that the number of rules can characterize the actual complexity of the rough set-based classifier effectively and defines a novel measure of attribute significance with complexity weight. Based on the new attribute significance, a new heuristic attribute reduction algorithm called HSRM-R algorithm is developed. The 10-fold cross-validation experiments in 21 UCI datasets show that HSRM-R algorithm obtains better generalization ability than conventional attribute reduction algorithms based on dependency degree, information entropy, Fisher score and Laplacian score. Further experiments show that HSRM-R algorithm obtains fewer rules and larger support coefficient. This means HSRM-R algorithm can extract stronger rules, which explains why it has better generalization ability to some extent. Although HSRM-R algorithm consumes more time than conventional algorithms, it obtains optimal classification accuracy in almost all datasets used in the experiments. Thus, the proposed HSRM-R algorithm provides an approach to guaranteeing the generalization ability theoretically in the case where users require high classification accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call