Abstract

The imbalanced data classification has gained popularity in machine learning research domain due to its prevalence in numerous applications and its difficulty. However, the majority of contemporary work primarily focuses on addressing between-class imbalance issues. Previous researches have shown that combined with other elements, such as within-class imbalance, small sample size and the presence of small disjuncts, the imbalanced data significantly increase the difficulties for the traditional classifiers to learn. Therefore, we propose a novel MeanShift-guided oversampling with self-adaptive sizes for imbalanced data classification. The proposed MeanShift-guided oversampling technique can simultaneously consider the distribution of minority class and majority class within the sphere with the current minority instance as its center, which can favor addressing small sample size and avoiding overlapping issues often caused by the nearest neighbor (NN)-based oversampling techniques. The incorporation of random vector and flexible cut-off mechanism for vector length can enhance the diversity among the generated synthetic minority instances and avoid overlapping, which makes it suitable for small sample size and small disjuncts problems. To address between-class and within-class imbalance issues, we also introduce a self-adaptive sizes assignment strategy for each minority instance to be oversampled, where the assigned size is inversely proportional to its density and its distance from the majority class. In addition to eliminating within-class imbalance, the strategy can ensure that the informative border minority instances have more opportunities to be oversampled, thus improving classification performance. Extensive experimental results on some datasets with different distributions and imbalance ratios show the proposed algorithm outperforms other compared ones with significant difference.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.