An ensemble oversampling method for imbalanced classification with prior knowledge via generative adversarial network

Yulin Zhang,Yuchen Liu,Yan Wang,Jie Yang

doi:10.1016/j.chemolab.2023.104775

Abstract

Currently, an increasing number of real-world applications show characteristics of class-imbalance classification suffering from severe class distribution skewing, thus requiring brand new algorithms to learn from imbalanced datasets. In this paper, a novel oversampling method using GAN framework is proposed for numerical imbalanced data, namely G-GAN. In the method, a Gaussian distribution of minority samples is estimated to get prior knowledge of minority class for the latent space of GAN. In order to increase the randomness of the generated samples, noises are obtained by a mixed strategy, that is, some noises of generator obey Gaussian distribution and others obey random distribution. Then G-GAN is trained to generate dispersive positive samples with the idea of Bagging, which could avoid the occurrence of overfitting. G-GAN is different from other literatures in that GAN does not directly generate minority samples, but adds the distribution information of minority samples to the latent space of GAN, and then generates minority samples. Compared with 11 commonly used oversampling methods, G-GAN obtains promising results in terms of G-mean, AUC, F-measure and ROC utilizing three classifiers on 11 benchmark imbalanced datasets. Furthermore, G-GAN is also validated on AUC metrics of a real Diabetes imbalanced dataset. The results demonstrate that G-GAN can provide great potential for imbalanced classification in the two numerical experiments.

Full Text