Abstract

The synthetic minority oversampling technique (SMOTE) is the most prevalent solution in class imbalance learning. While SMOTE and its variant methods handle imbalanced data well in most cases, they fail to take full advantage of the structural information in the overall data, which leads to the propagation of noise. Some existing SMOTE variants remove noisy samples by adding an undersampling process. However, due to the complexity of the data distribution, it is difficult to accurately identify real noise samples, leading to lower modeling quality. To this end, we propose an oversampling technique based on hypergraph identification and Gaussian distribution (HGDO). First, neighborhood reconstruction is performed for each sample depending on the sparse representation to build a hypergraph model, and outlier and noisy samples are filtered according to this model. Then, the weight of each retained minority class sample is determined through the distribution relationship of hyperedges and vertices. Finally, new samples are generated based on the Laplacian matrix and Gaussian distribution to balance the dataset. The comprehensive experimental analysis demonstrates the superiority of HGDO over some popular SMOTE variants.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.