Class imbalance is a crucial challenge in classification tasks, and in recent years, with the advancements in deep learning, research on oversampling techniques based on GANs has proliferated. These techniques have proven to be excellent in addressing the class imbalance issue by capturing the distributional features of minority samples during training and generating high-quality new samples. However, oversampling methods based on GANs may suffer from gradient vanishing, resulting in mode collapse, and produce noise and boundary-blurring issues when generating new samples. This paper proposes a novel oversampling method based on a conditional GAN (CGAN) incorporating Wasserstein distance. It generates an initial balanced dataset from minority class samples using the CGAN oversampling approach and then uses a noise and boundary recognition method based on K-means and k nearest neighbors algorithm to address the noise and boundary-blurring issues. The proposed method generates new samples that are highly consistent with the original sample distribution and effectively solves the problems of noise data and class boundary blurring. Experimental results on multiple public datasets show that the proposed method achieves significant improvements in evaluation metrics such as Recall, F1_score, G-mean, and AUC.
Read full abstract