In response to the need to generate synthetic minority class samples to extend minority classes, the SMOTE-based oversampling methods have been favored for class-imbalanced classification. They usually generate unnecessary noise when training data are not well separated. Although filtering-based oversampling methods are recognized as effective solutions for addressing noise generation through employing specific noise filters based on instance selection methods to remove suspicious noise, they suffer from the following issues: a) noise filters heavily rely on strong assumptions, causing low robustness to different datasets; b) noise filters are specially designed for a specific oversampling method, and are not easily extended to others; and c) noise filters have a relatively high time consumption. To address noise generation while overcoming the above issues a)-c), an oversampling framework based on sample subspace optimization with accelerated binary particle swarm optimization (OF-SSO-ABPSO) is proposed. OF-SSO-ABPSO is a wrapping framework compatible with almost all the oversampling methods. First, in the framework, a SMOTE-based method is used to generate synthetic minority class samples. Second, a novel accelerated binary particle swarm optimization (ABPSO) algorithm with a new search space reduction strategy, a new particle update mechanism, and a new fitness function is proposed. Third, a novel ABPSO-based sample subspace optimization (SSO-ABPSO) method is proposed and used as a noise filter to remove suspicious noise from the training set and synthetic minority class samples. Experiments prove that, a) OF-SSO-ABPSO can improve 6 representative SMOTE variations by addressing noise generation, and b) OF-SSO-ABPSO outperforms 7 state-of-the-art filtering-based oversampling methods.
Read full abstract