Hybrid Feature Selection Method Based on Feature Subset and Factor Analysis

Lizeng Gong,Xiaoyan Wang,Shanshan Xie,Mengyao Wang,Yan Zhang

doi:10.1109/access.2022.3222812

Abstract

With the advent of big data era and the rapid improvement of raw data scale, feature selection is the basic and critical technologies for data mining. However, in most of the studies on feature selection methods before, mainly directed to treat the single feature or overall feature subset, while the influence of the correlation and redundancy of features in the feature subset on the classification results is ignored. In this paper, combination of feature subset grouping and factor analysis (FA), a hybrid feature selection method based on feature subsets generation through factor analysis (FAFS_HFS) is proposed. Firstly, generate feature subsets based on the maximum load (maximum explanatory power) of each feature through factor analysis. Then, minimum redundancy maximum relevance (mRMR) and sequential forward selection (SFS) is used to remove the redundancy of each feature subset. Finally, fisher score (F-score) and feature subset SFS (FS_SFS) was utilized to evaluate and select feature subset, and obtain the optimal feature subset. Experiments are conducted on 14 datasets, the results show that FAFS_HFS has high classification accuracy and dimension reduction on almost all datasets, especially in high-dimensional datasets, and it has excellent efficiency and competitive classification performance compared with other contrastive methods.

Full Text