On oversampling imbalanced data with deep conditional generative models

Val Andrei Fajardo,David Findlay,Charu Jaiswal,Xinshang Yin,Roshanak Houmanfar,Honglei Xie,Jiaxi Liang,Xichen She,D.B Emerson

doi:10.1016/j.eswa.2020.114463

Val Andrei Fajardo, David Findlay + Show 7 more

Open Access

https://doi.org/10.1016/j.eswa.2020.114463

Copy DOI

Journal: Expert Systems With Applications	Publication Date: Dec 13, 2020
Citations: 36	License type: cc-by-nc-nd

Abstract

Class imbalanced datasets are common in real-world applications ranging from credit card fraud detection to rare disease diagnosis. Recently, deep generative models have proved successful for an array of machine learning problems such as semi-supervised learning, transfer learning, and recommender systems. However their application to class imbalance situations is limited. In this paper, we consider class conditional variants of generative adversarial networks and variational autoencoders and apply them to the imbalance problem. The main question we seek to answer is whether or not deep conditional generative models can effectively learn the distributions of minority classes so as to produce synthetic observations that ultimately lead to improvements in the performance of a downstream classifier. The numerical results show that this is indeed true and that deep generative models outperform traditional oversampling methods in many circumstances, especially in cases of severe imbalance.

Full Text