A feature selection method based on multiple feature subsets extraction and result fusion for improving classification performance

Jia Liu,Dong Li,Wangweiyi Shan,Shulin Liu

doi:10.1016/j.asoc.2023.111018

Abstract

Directly applying high-dimensional data to machine learning leads to dimensionality disasters and may induce model overfitting. Feature selection can effectively reduce feature size. However, a single feature selection algorithm has instability and poor generalization ability problems. The ensemble feature selection algorithm is complex to find a suitable feature subset aggregation strategy. To solve these two problems, we propose a feature selection method based on multiple feature subsets extraction and result fusion (FSM). Generate multiple feature subsets to improve stability. This method uses mutual information to mine the relationship between features and categories. Fast non-dominated sorting combines this correlation to distribute similar features in the same layer. A layer optimization algorithm is proposed to combine the layered features to generate multiple different feature subsets. To evaluate the excellence of feature subsets, FSM uses precision, recall, and F-Score comprehensively to assess and remove ineffective feature subsets. The idea of fusion is put on the output of the results. Multiple superior feature subsets train various classifiers. The results of numerous classifiers are fused as the final output according to the voting method, simplifying the ensemble feature selection method’s aggregation process. Experiments on 20 well-known datasets show that FSM can effectively reduce the data dimension and improve the classification performance compared with the original datasets. FSM performs well in classification performance and efficiency compared with other dimensionality reduction algorithms.

Full Text