An important problem in data science, feature selection (FS) consists of finding the optimal subset of features and eliminating irrelevant or redundant features. The FS task on high-dimensional data is challenging for the FS methods currently available in the literature. To overcome this limitation, we propose a novel feature selection method called External Attention-Based Feature Ranker for Large-Scale Feature Selection (EAR-FS) whose function is based on the logic of an attention mechanism and a hybrid metaheuristic. EAR-FS comprises three interdependent modules: (1) in the training module design, a multilayer perceptron network endowed with an attention module is trained to fit the dataset; (2) in feature ranking by attention, the trained attention module is used for attention updating and to rank features according to their importance; 3) in subset generation, a two-stage heuristic approach is applied to determine a small number of features that still guarantee high-accuracy performance. The experimental benchmark comprised 26 datasets of small, large and very large sizes, ranging from 15 to 12,533 features. Experiments performed against the state-of-the-art algorithms of FS show that our algorithm is efficient at selecting a small number of features from large datasets while guaranteeing excellent levels of classification accuracy. For instance, EAR-FS demonstrated its capability to reduce the features of the 11 Tumor dataset by 97% while maintaining a classifier accuracy of over 93%.
Read full abstract