Abstract

Real-world datasets frequently have an imbalanced class distribution, which significantly degrades classification performance. However, some studies have suggested that the adverse effects of class imbalance occur only when datasets have other intrinsic characteristics (such as class overlap, noise, and data scarcity). Noise and class overlap have the greatest effect. To deal with other intrinsic characteristics that affect classification performance in a multi-class environment, such as class overlap, noise, and data scarcity, we propose a method that can directly handle multi-class overlapping data, called the membership-based multi-class resampling and cleaning (MC-MBRC) algorithm. The proposed method divides samples into safe, overlapping, and noisy areas based on their membership degrees, and then according to the influence of the samples in each area on classification performance, it performs various operations such as noise removal, interpolation oversampling, and energy-based cleaning of the overlapping region. An extensive comparison using various datasets shows that, compared with state-of-the-art methods, the proposed method makes significant statistical improvements in different classification performance metrics and is robust to data containing class overlap and label noise. Furthermore, for datasets that do not have complex intrinsic features, MC -MBRC will not significantly degrade classification performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call