Filters and wrappers represent two mainstream approaches to feature selection (FS). Although evolutionary wrapper-based FS outperforms filters in addressing real-world classification problems, extending these methods to high-dimensional, many-objective optimization problems with imbalanced data poses substantial challenges. Overcoming computational costs and identifying suitable performance metrics are vital for navigating search operation complexities. Here, we propose using the Jaccard similarity (JS) in a set-based evolutionary many-objective (JSEMO) FS search, addressing both evolutionary FS and imbalanced classifier choice concurrently. This study highlights the mutual influence between these aspects, impacting overall algorithm performance. JSEMO integrates JS into population initialization, reproduction, and elitism steps, enhancing diversity and avoiding duplicate solutions. The set-based variation operator utilizes intersection and union operators for compatibility with binary coding. We also introduce a double-weighted KNN (KNN2W) classifier with four supportive objectives as a many-objective FS problem to handle imbalanced distributions. Compared with 20 methods on 15 benchmark problems, JSEMO produces distinct optimal features, significantly improving overall accuracy, balance accuracy, and g-mean metrics with comparable feature set size and computational cost. The ablation study underscores the positive impact of all JSEMO components, highlighting the set-based variation operation with JS and KNN2W with relevant evaluation metrics as the most influential aspects.
Read full abstract