Abstract
The classification ability in unseen objects, namely generalization ability, remains a long-standing challenge in rough set-based classifier. Current research mainly focuses on introducing thresholds to tolerate some errors in seen objects. The reason for introducing thresholds and the selection of threshold still lack sufficient theoretical support. The structural risk minimization (SRM) inductive principle is one of the most effective theories to control the generalization ability, which suggests a trade-off between errors in seen objects and complexity. Therefore, this paper introduces the SRM principle into rough set-based classifier and proposes SRM algorithm of rough set-based classifier called SRM-R algorithm. SRM-R algorithm uses the number of rules to characterize the actual complexity of rough set-based classifier and obtains the optimal trade-off between errors in seen objects and complexity through genetic multi-objective optimization. The tenfold cross-validation experiment in 12 UCI datasets shows SRM-R algorithm can significantly improve the generalization ability compared with conventional threshold algorithm. Besides, this paper uses other two possible complexity metrics including the number of attributes and attribute space to construct corresponding SRM algorithms, respectively, and compared their classification accuracy with that of SRM-R algorithm. Comparison result shows SRM-R algorithm obtains optimal classification accuracy. This indicates that the number of rules characterizes the complexity more effectively than the number of attributes and attribute space. Further experiments show that SRM-R algorithm obtains fewer rules and larger support coefficient, which means it extracts stronger rules. This explains why it obtains better generalization ability to some extent.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.