Imbalanced classification is a common issue in Machine Learning, particularly when misclassifying minor instances leads to significant costs. In literature, various strategies have been employed to address this problem. These include data-level, algorithm-level, cost-sensitive, and hybrid-level algorithms designed to tackle imbalanced problems. This paper aims to introduce a novel method that simultaneously enhances the ability of classification models to identify patterns more effectively and addresses imbalanced problems while minimizing alterations to the original data distribution. Our proposed framework combines ensemble learning, space partitioning, and the Synthetic Minority Oversampling Technique (SMOTE). This method decomposes the space into balanced sub-spaces and then trains an ensemble classifier based on these sub-spaces using a bagging approach. In the initial step, we develop a Space Partitioning by Metaheuristic algorithm (SPMH) to divide the space into multiple balanced subspaces. In the subsequent step, we present Imbalanced Classification by SPMH (ICSPMH) as a solution to imbalanced class problems. ICSPMH uses SPMH multiple times to divide the space into different sub-spaces, creating various sub-spaces each time. It then trains different classifiers for each portion of the space, creating an ensemble classifier through a bagging technique. To assess the performance of our proposed framework, we selected 44 well-known datasets for comparison with some state-of-the-art approaches. The results demonstrate that ICSPMH outperforms other competent methods and can potentially reduce the oversampling rate to zero. Additionally, an experiment indicated that the choice of metaheuristic algorithm in SPMH does not significantly impact the final performance. The paper also includes a correlation analysis between oversampling rate and final performance, revealing that the framework effectively eliminates imbalanced data problems with minimal changes to the original dataset. In summary, because ICSPMH applies fewer changes in data distribution and sets up local classifiers that improve classification performance, it looks like a promising method for classifying imbalanced datasets.
Read full abstract