Abstract

Generative Adversarial Network (GAN) has become an active research field due to its capability to generate quality simulation data. However, two consistent distributions (generated data distribution and original data distribution) produced by GAN cannot guarantee that generated data are always close to real data. Traditionally GAN is mainly applied to images, and it becomes more challenging for numeric datasets. In this paper, we propose a histogram-based GAN model (His-GAN). The purpose of our proposed model is to help GAN produce generated data with high quality. Specifically, we map generated data and original data into a histogram, then we count probability percentile on each bin and calculate dissimilarity with traditional f-divergence measures (e.g., Hellinger distance, Jensen–Shannon divergence) and Histogram Intersection Kernel. After that, we incorporate this dissimilarity score into training of the GAN model to update the generator’s parameters to improve generated data quality. This is because the parameters have an influence on the generated data quality. Moreover, we revised GAN training process by feeding GAN model with one group of samples (these samples can come from one class or one cluster that hold similar characteristics) each time, so the final generated data could contain the characteristics from a single group to overcome the challenge of figuring out complex characteristics from mixed groups/clusters of data. In this way, we can generate data that is more indistinguishable from original data. We conduct extensive experiments to validate our idea with MNIST, CIFAR-10, and a real-world numeric dataset, and the results clearly show the effectiveness of our approach.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call