Abstract
Due to their success at synthesising highly realistic images, many claims have been made about optimality and convergence in generative adversarial networks (GANs). But what of vanishing gradients, saturation, and other numerical problems noted by AI practitioners? Attempts to explain these phenomena have so far been based on purely empirical studies or differential equations, valid only in the limit. We take a fresh look at these questions using explicit, low-dimensional models. We revisit the well known optimal discriminator result and, by construction of a counterexample, show that it is not valid in the case of practical interest: when the dimension of the latent variable is less than that of the data: dim(z) < dim(x). To examine convergence issues, we consider a 1-D least squares (LS) GAN with exponentially distributed data, a Rayleigh distributed latent variable, a square law generator and a discriminator of the form D(x) = (1 + erf(x))=2 where erf is the error function. We obtain explicit representations of the cost (or loss) function and its derivatives. The representation is exact down to the evaluation of a well-behaved 1-D integral. We present analytical numerical examples of 2D and 4D parameter trajectories for gradient-based minimax optimisation. Although the cost function has no saddle points, it generally has a minimum, maximum and plateaux areas. The gradient algorithms typically converge to a plateau, where the gradients vanish and the cost function saturates. This is an undesirable setting with no implications of optimality for either the generator or discriminator. The analytical method is compared with stochastic gradient optimisation and proven to be a very accurate predictor of the latter’s performance. The quasi-deterministic framework we develop is a powerful analytical tool for understanding convergence behaviour of low-dimensional GANs based on least-squares cost criteria.
Highlights
Generative adversarial networks (GANs) employ a minimax or game-theoretic framework to derive a mapping from a compressed space of “latent variables” to the space of the 2-D image data
We provide a counterexample to a claim from the literature: that GANs converge to a saddle point of the cost function [28], i.e. a Nash equilibrium
CONCLUSIONS & FURTHER WORK Noting the well recognised numerical difficulties associated with training of generative adversarial networks, and the large number of ad hoc modifications since their intro
Summary
Generative adversarial networks (GANs) employ a minimax or game-theoretic framework to derive a mapping from a compressed space of “latent variables” to the space of the 2-D image data. This example is based on a 1-D latent variable with 1-D data. The error function has already been applied in probabilistic analyses of neural networks: Amari et al [18] used it to obtain Fisher information matrix of a single analogue neuron; but it does not seem to have been considered for GANs. Referring to the example just described as the Rayleigh/Square/Exponential/ Erf or R/S/E/E case, we derive a number of theoretically rigorous results concerning the LSGAN cost function (section III-B) and its first and second order derivatives (section IV-A and Appendix).
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have