Imbalanced data can always be observed in our daily life and various practical tasks. A lot of well-constructed machine learning methodologies may produce ineffective performance, when conducted on this kind of data. This originates from the produced high training biases that towards the majority class instances. Among all the solutions of this problem, data generation of the minority class is always considered the most effective approach. However, in all the previous works, data are always processed sample-wisely and the distribution of each single data attribute is never noticed. So, in this paper, to estimate the mechanism of how each attribute contributes to its label, we explore the potential connection between the two items by Conditional Generative Adversarial Networks (CGAN) separately and individually. Then, the constructed new instances are purified by a designed attribute-based minimax filter and the survivors are concatenated to form the eventual generated data. In other words, different from the CGAN based data generation way, the proposed approach improves it by additionally considering all the single attribute patterns of the data that to construct new instances. In addition, we extend the binary class imbalanced learning framework to multiple class one. In the experimental part, the improved model is compared against GAN, CGAN and some other standard multiple-class oversampling algorithms on several widely used datasets. Results, in terms of four common measurements, have shown that the proposed approach can produce comparable and always superior performance when compared with the competitors.
Read full abstract