Abstract

With the advent of big data era and the rapid improvement of raw data scale, feature selection is the basic and critical technologies for data mining. However, in most of the studies on feature selection methods before, mainly directed to treat the single feature or overall feature subset, while the influence of the correlation and redundancy of features in the feature subset on the classification results is ignored. In this paper, combination of feature subset grouping and factor analysis (FA), a hybrid feature selection method based on feature subsets generation through factor analysis (FAFS_HFS) is proposed. Firstly, generate feature subsets based on the maximum load (maximum explanatory power) of each feature through factor analysis. Then, minimum redundancy maximum relevance (mRMR) and sequential forward selection (SFS) is used to remove the redundancy of each feature subset. Finally, fisher score (F-score) and feature subset SFS (FS_SFS) was utilized to evaluate and select feature subset, and obtain the optimal feature subset. Experiments are conducted on 14 datasets, the results show that FAFS_HFS has high classification accuracy and dimension reduction on almost all datasets, especially in high-dimensional datasets, and it has excellent efficiency and competitive classification performance compared with other contrastive methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.