Combinatorial online high‐order interactive feature selection based on dynamic graph convolution network

Wen-Bin Wu,Jun-Jun Sun,Si-Bao Chen,Chris Ding,Bin Luo

doi:10.1016/j.sigpro.2023.109133

Abstract

Traditional feature selection algorithms assume that the sample and feature space is known before learning, while most of the data is feature streams or data streams in reality. Currently, streaming feature selection algorithms can retain relevant features by removing redundant and irrelevant features based on the interaction between features, but they ignore the specific number of features that have interaction. Most of the existing studies only consider the case of interaction between two features, which is not quite in line with most realistic scenarios, i.e., the number of features with interaction is unknown. This paper concentrates on the high-order interactions between stream feature, and proposes a Combinatorial High-Order Interactive Feature Selection based on Dynamic GCN and Sparse learning (CHOIFS-DGS). Based on previous definitions of feature interaction, this paper proposes some new metrics to measure the degree of interaction between a newly arrived feature and an already selected feature. CHOIFS-DGS consists of three main parts, namely: low-order online feature selection based on interaction measure, high-order online feature selection based on dynamic GCN, and Intra-group sparse feature selection. In the experimental analysis section, this study employs two different classifiers and eleven publicly released data sets, including gene data related to diseases and data from two classification challenges (NeurIPS 2003 feature selection challenge and WCCI 2006 Performance Prediction Challenge). The experimental results demonstrate that the proposed CHOIFS-DGS model significantly improves classification accuracy on all eleven data sets, while using a relatively smaller number of features, thus fulfilling the role of key feature selection. Furthermore, the CHOIFS-DGS algorithm consists of three components: LO-OIFS, HO-OIFS, and Group-Sparse. By applying these three sub-modules separately for extracting data features and comparing the results with CHOIFS-DGS, it is found that the performance of CHOIFS-DGS is lower than that of the individual sub-modules only in three data sets, while significantly better in the remaining eight data sets. This indicates that the integrated use of the three sub-modules can enhance model accuracy. Finally, in the ablation experiment, to verify the necessity of considering higher-order interactions among features, the results of the HO-OIFS module were compared with those of the other modules. The results show that the model’s accuracy significantly improves after incorporating the HO-OIFS module, thereby demonstrating that considering higher-order interactions between features is essential.

Full Text