Generation and Evaluation of Tabular Data in Different Domains Using Gans

Persevearance Marecha,Lu Ye

doi:10.9734/ajrcos/2023/v16i1331

Abstract

Deep learning techniques like Generative Adversarial Networks (GANs) provide solutions in many domains where real data needs to be kept private. Synthesizing tabular data is difficult because of its high complexity. Tabular data usually contains a mixture of discrete and continuous data, which is not an easy model to build. The contributions made in this paper include training and generating data with the original Vanilla Gan, then CGan and WGan-Gp and WCGan-Gp which performs better than the former. The Adult Income Census dataset mainly focuses on predicting whether income exceeds 50,000 per year based on census data, then comparing the accuracy of machine learning models and calculating the F1 scores. Then the use of TimeGan on the stock dataset, comparing synthetic data vs real data. This paper will explore the use of GANs for generating and evaluating tabular data in different domains.

Full Text