Abstract

Generative Adversarial Networks (GANs) are a thriving unsupervised machine learning technique that has led to significant advances in various fields such as computer vision, natural language processing, among others. However, GANs are known to be difficult to train and usually suffer from mode collapse and the discriminator winning problem. To interpret the empirical observations of GANs and design better ones, we deconstruct the study of GANs into three components and make the following contributions. Formulation: we propose a perturbation view of the population target of GANs. Building on this interpretation, we show that GANs can be connected to the robust statistics framework, and propose a novel GAN architecture, termed as Cascade GANs, to provably recover meaningful low-dimensional generator approximations when the real distribution is high-dimensional and corrupted by outliers. Generalization: given a population target of GANs, we design a systematic principle, projection under admissible distance, to design GANs to meet the population requirement using only finite samples. We implement our principle in three cases to achieve polynomial and sometimes near-optimal sample complexities: (1) learning an arbitrary generator under an arbitrary pseudonorm; (2) learning a Gaussian location family under total variation distance, where we utilize our principle to provide a new proof for the near-optimality of the Tukey median viewed as GANs; (3) learning a low-dimensional Gaussian approximation of a high-dimensional arbitrary distribution under Wasserstein distance. We demonstrate a fundamental trade-off in the approximation error and statistical error in GANs, and demonstrate how to apply our principle in practice with only empirical samples to predict how many samples would be sufficient for GANs in order not to suffer from the discriminator winning problem. Optimization: we demonstrate alternating gradient descent is provably not locally asymptotically stable in optimizing the GAN formulation of PCA. We found that the minimax duality gap being non-zero might be one of the causes, and propose a new GAN architecture whose duality gap is zero, where the value of the game is equal to the previous minimax value (not the maximin value). We prove the new GAN architecture is globally asymptotically stable in solving PCA under alternating gradient descent.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call