Improving the quality of generative models through Smirnov transformation

Ángel González-Prieto,Alberto Mozo,Sandra Gómez-Canaval,Edgar Talavera

doi:10.1016/j.ins.2022.07.066

Abstract

Solving the convergence issues of Generative Adversarial Networks (GANs) is one of the most outstanding problems in generative models. In this work, we propose a novel activation function to be used as output of the generator agent. This activation function is based on the Smirnov probabilistic transformation and it is specifically designed to improve the quality of the generated data. In sharp contrast to previous works, our activation function provides a more general approach that deals not only with the replication of categorical variables but with any type of data distribution (continuous or discrete). Moreover, our activation function is derivable and therefore, it can be seamlessly integrated in the backpropagation computations during the GAN training processes. To validate this approach, we firstly evaluate our proposal on two different data sets: a) an artificially rendered data set containing a mixture of discrete and continuous variables, and b) a real data set of flow-based network traffic data containing both normal connections and cryptomining attacks. In addition, three publicly available data sets were added to the evaluation to generalize the obtained results. To evaluate the fidelity of the generated data, we analyze their results both in terms of quality measures of statistical nature and regarding the use of these synthetic data to feed a nested machine learning-based classifier.The experimental results evince a clear outperformance of a Wasserstein GAN network (WGAN) tuned with this new activation function with respect to both a naïve mean-based generator and a standard WGAN. The quality of the generated data allows to fully substitute real data with synthetic data for training the nested classifier without a significant fall in the obtained accuracy.

Full Text