A MeanShift-guided oversampling with self-adaptive sizes for imbalanced data classification

Xinmin Tao,Shan Huang,Lin Qi,Zhiting Fan,Yujia Zheng,Xiaohan Zhang

doi:10.1016/j.ins.2024.120699

Abstract

The imbalanced data classification has gained popularity in machine learning research domain due to its prevalence in numerous applications and its difficulty. However, the majority of contemporary work primarily focuses on addressing between-class imbalance issues. Previous researches have shown that combined with other elements, such as within-class imbalance, small sample size and the presence of small disjuncts, the imbalanced data significantly increase the difficulties for the traditional classifiers to learn. Therefore, we propose a novel MeanShift-guided oversampling with self-adaptive sizes for imbalanced data classification. The proposed MeanShift-guided oversampling technique can simultaneously consider the distribution of minority class and majority class within the sphere with the current minority instance as its center, which can favor addressing small sample size and avoiding overlapping issues often caused by the nearest neighbor (NN)-based oversampling techniques. The incorporation of random vector and flexible cut-off mechanism for vector length can enhance the diversity among the generated synthetic minority instances and avoid overlapping, which makes it suitable for small sample size and small disjuncts problems. To address between-class and within-class imbalance issues, we also introduce a self-adaptive sizes assignment strategy for each minority instance to be oversampled, where the assigned size is inversely proportional to its density and its distance from the majority class. In addition to eliminating within-class imbalance, the strategy can ensure that the informative border minority instances have more opportunities to be oversampled, thus improving classification performance. Extensive experimental results on some datasets with different distributions and imbalance ratios show the proposed algorithm outperforms other compared ones with significant difference.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A MeanShift-guided oversampling with self-adaptive sizes for imbalanced data classification

Abstract

Talk to us

Similar Papers

More From: Information Sciences

Lead the way for us

Similar Papers

A hierarchical heterogeneous ant colony optimization based oversampling algorithm using feature similarity for classification of imbalanced data
Sreeja N.K ... Sreelaja N.K
Applied Soft Computing | VOL. 166
Sreeja N.K, et. al.Sreeja N.K ... Sreelaja N.K
04 Sep 2024
Applied Soft Computing | VOL. 166

Cross-Concatenation: Tackling Uncertainty in Imbalanced Big Data Classification
Hadi Mansourifar ... Weidong Shi
-
Hadi Mansourifar, et. al.Hadi Mansourifar ... Weidong Shi
15 Dec 2021
15 Dec 2021

Deep Learning for Imbalanced Multimedia Data Classification
Yilin Yan ... Min Chen
-
Yilin Yan, et. al.Yilin Yan ... Min Chen
01 Dec 2015
01 Dec 2015

Modeling of class imbalance using an empirical approach with spambase dataset and random forest classification
Kiranmayi Kotipalli ... Shan Suthaharan
-
Kiranmayi Kotipalli, et. al.Kiranmayi Kotipalli ... Shan Suthaharan
13 Oct 2014
13 Oct 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A MeanShift-guided oversampling with self-adaptive sizes for imbalanced data classification

Abstract

Talk to us

Similar Papers

More From: Information Sciences