Abstract

Selection of important attributes/features from decision information systems plays a vital role in data mining and machine learning tasks. It is regarded as a very interesting, but challenge problem, especially when faced with continuous numerical/real attributes. Neighborhood rough sets and fuzzy rough sets based attribute selection methods are well-known for dealing effectively with numerical/real attributes. However, characteristics of data may be described incompletely by neighborhood classes in the neighborhood rough set model, while the fuzzy rough sets based approach is still quite time-consuming because of the complex calculations on fuzzy equivalence classes. To address these limitations, we apply the concept of sets of level α (α-cut sets) in the fuzzy set theory to construct α-level fuzzy equivalence classes which provide a foundation for developing basic concepts of a new α-level fuzzy rough set model. We will see that under the properties of the α-cut sets, the α-level fuzzy equivalence classes not only help to significantly reduce the computational cost, but also preserve most of the information about the relationships between the objects, and even can decrease some noise in the data. Based on the α-level fuzzy rough set model, we define new reducts and then propose the FSFCF algorithm for attribute subset selection from the decision information systems containing continuous data. It is important to emphasize some advantages of the proposed method. First, in order to evaluate and select optimal attributes, we use an α-level fuzzy certainty factor with the comprehensive consideration to all objects in the universe. Second, the FSFCF algorithm is designed in the hybrid filter–wrapper approach to reduce the size of selected attribute subset as well as enhance the classification accuracy. Therefore, the proposed method can significantly improve the performance of the attribute selection for continuous data. To verify the effectiveness of FSFCF, we implement experiments on a variety of real-world data sets. The results demonstrated that the proposed method outperforms the compared state-of-the-art methods in terms of the computational time, the size of reduct and the classification accuracy for almost all of data sets.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.