Abstract

Imbalanced learning is an important branch of machine learning. It addresses the challenge of improving classifier accuracy for minority classes in imbalanced data sets. Currently, the mainstream methods for handling imbalanced learning are the synthetic minority oversampling technique (SMOTE) and its variants, which generate synthetic minority class samples to balance the dataset. However, existing methods suffer from issues such as increased sample overlap, exacerbated intra-class imbalance, and are sensitive to parameter settings. These issues make it challenging to generate high-quality minority class samples and can adversely affect the dataset. To address these challenges, this study proposes a novel overlapping minimization-based over-sampling (OMOS) algorithm for binary imbalanced classification. The OMOS algorithm consists of four steps: clustering, filtering, auto-encoding, and oversampling. In the clustering step, the mean shift algorithm is utilized to cluster the original dataset and identify clusters that belong to the minority class. In the filtering step, safe samples are selected that maintain consistent labels before and after clustering. Then, in the auto-encoding step, autoencoders are utilized to capture the distribution characteristics of safe samples within each minority class cluster. Finally, in the last step, minority class samples are generated based on the probability distribution learned from safe samples. Furthermore, OMOS introduces a novel approach to compute suitable sampling rates for each minority class cluster to handle intra-class imbalance. Experimental results show that the proposed OMOS algorithm outperforms six state-of-the-art SMOTE-based oversampling algorithms on 20 real-world imbalanced datasets and four classifiers: naive Bayes classifier, support vector machine, logistic regression, and decision trees. This demonstrates that OMOS is effective for binary imbalanced classification tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call