Abstract
Imbalanced class distributions in machine learning, where the minority class is often under-represented, pose a substantial challenge. Synthetic Minority Over-sampling Technique (SMOTE) has been widely employed to address this issue by generating synthetic minority samples through interpolation. Despite its popularity, SMOTE exhibits certain drawbacks caused by the implementation of random interpolation samples. In this paper, we introduce a new data level technique for oversampling, called Fuzzy C-Means Center-SMOTE (FCM-CSMOTE), which generates synthetic samples in each cluster using its center considered as the memory of the main data components. We demonstrate that the proposed selective strategy has a very low probability to generate noise. The experimental results demonstrate that the proposed method performs better than the state-of-the-art approaches on 21 real unbalanced data sets (regular and large size data set) in terms of several metrics, including Geometric Mean (GM), F-Measure (FM), Area Under the Curve (AUC), and Accuracy.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.