Meta Learning for Imbalanced Big Data Analysis by using Generative Adversarial Networks

Sanghyun Seo,Yongjin Jeon,Juntae Kim

doi:10.1145/3220199.3220205

Abstract

Imbalanced big data means big data where the ratio of a certain class is relatively small compared to other classes. When the machine learning model is trained by using imbalanced big data, the problem with performance drops for the minority class occurs. For this reason, various oversampling methodologies have been proposed, but simple oversampling leads to problem of the overfitting. In this paper, we propose a meta learning methodology for efficient analysis of imbalanced big data. The proposed meta learning methodology uses the meta information of the data generated by the generative model based on Generative Adversarial Networks. It prevents the generative model from becoming too similar to the real data in minority class. Compared to the simple oversampling methodology for analyzing imbalanced big data, it is less likely to cause overfitting. Experimental results show that the proposed method can efficiently analyze imbalanced big data.

Full Text