HGDO: An oversampling technique based on hypergraph recognition and Gaussian distribution

Liyan Jia,Zhiping Wang,Pengfei Sun,Peiwen Wang

doi:10.1016/j.ins.2024.120891

Abstract

The synthetic minority oversampling technique (SMOTE) is the most prevalent solution in class imbalance learning. While SMOTE and its variant methods handle imbalanced data well in most cases, they fail to take full advantage of the structural information in the overall data, which leads to the propagation of noise. Some existing SMOTE variants remove noisy samples by adding an undersampling process. However, due to the complexity of the data distribution, it is difficult to accurately identify real noise samples, leading to lower modeling quality. To this end, we propose an oversampling technique based on hypergraph identification and Gaussian distribution (HGDO). First, neighborhood reconstruction is performed for each sample depending on the sparse representation to build a hypergraph model, and outlier and noisy samples are filtered according to this model. Then, the weight of each retained minority class sample is determined through the distribution relationship of hyperedges and vertices. Finally, new samples are generated based on the Laplacian matrix and Gaussian distribution to balance the dataset. The comprehensive experimental analysis demonstrates the superiority of HGDO over some popular SMOTE variants.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

HGDO: An oversampling technique based on hypergraph recognition and Gaussian distribution

Abstract

Talk to us

Similar Papers

More From: Information Sciences

Lead the way for us

Similar Papers

SMOTE-kTLNN: A hybrid re-sampling method based on SMOTE and a two-layer nearest neighbor classifier
Pengfei Sun ... Zhaohui Xu
Expert Systems with Applications | VOL. 238
Pengfei Sun, et. al.Pengfei Sun ... Zhaohui Xu
29 Sep 2023
Expert Systems with Applications | VOL. 238

Synthetic minority oversampling using edited displacement-based [formula omitted]-nearest neighbors
Alex X Wang ... Binh P Nguyen
Applied Soft Computing | VOL. 148
Alex X Wang, et. al.Alex X Wang ... Binh P Nguyen
04 Oct 2023
Applied Soft Computing | VOL. 148

SMOTE-RkNN: A hybrid re-sampling method based on SMOTE and reverse k-nearest neighbors
Aimin Zhang ... Shang Gao
Information Sciences | VOL. 595
Aimin Zhang, et. al.Aimin Zhang ... Shang Gao
23 Feb 2022
Information Sciences | VOL. 595

Instance weighted SMOTE by indirectly exploring the data distribution
Aimin Zhang ... Xibei Yang
Knowledge-Based Systems | VOL. 249
Aimin Zhang, et. al.Aimin Zhang ... Xibei Yang
04 May 2022
Knowledge-Based Systems | VOL. 249

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

HGDO: An oversampling technique based on hypergraph recognition and Gaussian distribution

Abstract

Talk to us

Similar Papers

More From: Information Sciences