Ensemble learning-based stability improvement method for feature selection towards performance prediction

Feng Xiang,Yulong Zhao,Meng Zhang,Ying Zuo,Xiaofu Zou,Fei Tao

doi:10.1016/j.jmsy.2024.03.001

Abstract

The uncertainty and complexity of real data collected in the industrial production process increase the difficulty in data-based knowledge discovering. Feature selection is an important step to remove redundant and irrelevant data, and thus it is essential to construct an efficient feature selection method. In this paper, an ensemble learning-driven stable feature selection method is proposed to improve the stability and accuracy of the feature selection. Firstly, datasets of different characteristics are generated to increase the diversity of data segments for feature selection. Secondly, two criteria (stability and prediction accuracy) are adopted to evaluate the performance weight of each feature selection algorithm, to ensure that the results of high-performance selectors have high priority in the algorithm aggregation process. Thirdly, the feature subsets are weighted and filtered based on expert experience to further ensure its stability. Finally, comparative experiments are conducted to show the effectiveness of the proposed method. Comparing with other methods, the proposed one can achieve the highest overall stability for feature selection (namely 0.936 measured by the Spearman rank correlation coefficient), and select the reasonable feature subset for data-driven prediction with the low mean absolute error (namely 0.315 as the average level).

Full Text