Feature Selection (FS) is an important preprocessing step that is involved in machine learning and data mining tasks for preparing data (especially high-dimensional data) by eliminating irrelevant and redundant features, thus reducing the potential curse of dimensionality of a given large dataset. Consequently, FS is arguably a combinatorial NP-hard problem in which the computational time increases exponentially with an increase in problem complexity. To tackle such a problem type, meta-heuristic techniques have been opted by an increasing number of scholars. Herein, a novel meta-heuristic algorithm, called Sparrow Search Algorithm (SSA), is presented. The SSA still performs poorly on exploratory behavior and exploration-exploitation trade-off because it does not duly stimulate the search within feasible regions, and the exploitation process suffers noticeable stagnation. Therefore, we improve SSA by adopting: i) a strategy for Random Re-positioning of Roaming Agents (3RA); and ii) a novel Local Search Algorithm (LSA), which are algorithmically incorporated into the original SSA structure. To the FS problem, SSA is improved and cloned as a binary variant, namely, the improved Binary SSA (iBSSA), which would strive to select the optimal or near-optimal features from a given dataset while keeping the classification accuracy maximized. For binary conversion, the iBSSA was primarily validated against nine common S-shaped and V-shaped Transfer Functions (TFs), thus producing nine iBSSA variants. To verify the robustness of these variants, three well-known classification techniques, including k-Nearest Neighbor (k-NN), Support Vector Machine (SVM), and Random Forest (RF) were adopted as fitness evaluators with the proposed iBSSA approach and many other competing algorithms, on 18 multifaceted, multi-scale benchmark datasets from the University of California Irvine (UCI) data repository. Then, the overall best-performing iBSSA variant for each of the three classifiers was compared with binary variants of 12 different well-known meta-heuristic algorithms, including the original SSA (BSSA), Artificial Bee Colony (BABC), Particle Swarm Optimization (BPSO), Bat Algorithm (BBA), Grey Wolf Optimization (BGWO), Whale Optimization Algorithm (BWOA), Grasshopper Optimization Algorithm (BGOA) SailFish Optimizer (BSFO), Harris Hawks Optimization (BHHO), Bird Swarm Algorithm (BBSA), Atom Search Optimization (BASO), and Henry Gas Solubility Optimization (BHGSO). Based on a Wilcoxon’s non-parametric statistical test (alpha =0.05), the superiority of iBSSA with the three classifiers was very evident against counterparts across the vast majority of the selected datasets, achieving a feature size reduction of up to 92% along with up to 100% classification accuracy on some of those datasets.
Read full abstract