Abstract

In the field of artificial intelligence, classification algorithms tend to be biased toward the majority class samples when encountering imbalanced data, resulting in low recognition rates for minority class samples. Undersampling techniques address this issue by decreasing the number of majority class samples to balance the original data distribution before the dataset is learned. However, current clustering-based undersampling methods have limitations that directly affect the original imbalanced dataset and the final classification performance. To address these problems, we propose a novel three-stage undersampling framework with denoising, fuzzy c-means clustering, and representative sample selection (UFFDFR). This framework improves the classification performance on imbalanced data by removing noise and unrepresentative samples from the majority class. Experiments on 15 different imbalanced datasets demonstrate that UFFDFR effectively removed noise and unrepresentative majority class samples and improved classification performance. Furthermore, UFFDFR outperformed three classic and three state-of-the-art clustering-based undersampling methods in terms F-measure, G-mean, and AUC for five classification algorithms, which was confirmed by the Friedman and Nemenyi post-hoc statistical tests.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.