Abstract

Imbalanced class distributions in machine learning, where the minority class is often under-represented, pose a substantial challenge. Synthetic Minority Over-sampling Technique (SMOTE) has been widely employed to address this issue by generating synthetic minority samples through interpolation. Despite its popularity, SMOTE exhibits certain drawbacks caused by the implementation of random interpolation samples. In this paper, we introduce a new data level technique for oversampling, called Fuzzy C-Means Center-SMOTE (FCM-CSMOTE), which generates synthetic samples in each cluster using its center considered as the memory of the main data components. We demonstrate that the proposed selective strategy has a very low probability to generate noise. The experimental results demonstrate that the proposed method performs better than the state-of-the-art approaches on 21 real unbalanced data sets (regular and large size data set) in terms of several metrics, including Geometric Mean (GM), F-Measure (FM), Area Under the Curve (AUC), and Accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call