Abstract

Feature selection (FS) is an important preprocessing step in machine learning and data mining. In this paper, a new feature subset evaluation method is proposed by constructing a sample graph (SG) in different k-features and applying community modularity to select highly informative features as a group. However, these features may not be relevant as an individual. Furthermore, relevant in-dependency rather than irrelevant redundancy among the selected features is effectively measured with the community modularity Q value of the sample graph in the k-features. An efficient FS method called k-features sample graph feature selection is presented. A key property of this approach is that the discriminative cues of a feature subset with the maximum relevant in-dependency among features can be accurately determined. This community modularity-based method is then verified with the theory of k-means cluster. Compared with other state-of-the-art methods, the proposed approach is more effective, as verified by the results of several experiments.

Highlights

  • Feature selection (FS) is widely investigated and utilized in machine learning and data mining research

  • The proposed approach was compared with several popular FS algorithms, including MIFS_U, mrmr, CMIM, Fisher, Laplacian score[33], RELIEF62, Simba-sig[63], and Greedy Feature Flip (G-Flip-sig)[63]

  • To address the redundancy problem of ranking in filter methods, the sample graph in k-features that captures the relevant independency among feature subsets is utilized rather than the conditional mutual information (MI) criteria

Read more

Summary

Introduction

Feature selection (FS) is widely investigated and utilized in machine learning and data mining research. In this context, a feature, called attribute or variable, represents a property of a process or system. The goal of FS is to select the feature subsets of informative attributes or variables to build models that describe data and to eliminate redundant or irrelevant noise features to improve predictive accuracy[1]. FS maintains the original intrinsic properties of the selected features and facilitates data visualization and understanding[2]. FS has been extensively applied to many applications, such as bio-informatics[3], image retrieval[4], and text classification[5], because of its capabilities

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.