Abstract

Generative adversarial networks (GANs) learn a deep generative model that is able to synthesize novel, high-dimensional data samples. New data samples are synthesized by passing latent samples, drawn from a chosen prior distribution, through the generative model. Once trained, the latent space exhibits interesting properties that may be useful for downstream tasks such as classification or retrieval. Unfortunately, GANs do not offer an ``inverse model,'' a mapping from data space back to latent space, making it difficult to infer a latent representation for a given data sample. In this paper, we introduce a technique, inversion, to project data samples, specifically images, to the latent space using a pretrained GAN. Using our proposed inversion technique, we are able to identify which attributes of a data set a trained GAN is able to model and quantify GAN performance, based on a reconstruction loss. We demonstrate how our proposed inversion technique may be used to quantitatively compare the performance of various GAN models trained on three image data sets. We provide codes for all of our experiments in the website (https://github.com/ToniCreswell/InvertingGAN).

Highlights

  • G ENERATIVE adversarial networks (GANs) [10], [20] are a class of generative model which are able to synthesize novel, realistic looking images of faces, digits, and street numbers [20]

  • Radford et al [20] demonstrated that GANs learn a “rich linear structure,” meaning that algebraic operations in Z -space often lead to semantically meaningful synthetic samples in image space

  • 3) We demonstrate several ways in which our proposed inversion technique may be used to both qualitatively (Section VI-B) and quantitatively compare GAN models (Section VII)

Read more

Summary

INTRODUCTION

G ENERATIVE adversarial networks (GANs) [10], [20] are a class of generative model which are able to synthesize novel, realistic looking images of faces, digits, and street numbers [20]. Dumoulin et al [9] (ALI) and Donahue et al (BiGAN) [8] proposed learning a third, decoder network along side the generator and discriminator to map image samples back to Z -space They demonstrated results on MNIST, ImageNet, CIFAR-10, SVHN, and CelebA. Li et al [16] proposed a method to improve reconstructions Some drawbacks to these approaches [8], [9], [16] include the need to train a third network, which increases the number of parameters that have to be learned; with more parameters, there is generally a greater chance of overfitting [23], or even of memorizing [12] input samples. Algorithm 1: Algorithm for Inferring z∗ ∈ d , the Latent Representation for an Image x ∈ m×m

Result
METHOD
Inverting a Batch of Samples
RELATION TO PREVIOUS WORK
EXPERIMENTS
Omniglot
CelebA
QUANTITATIVELY COMPARING MODELS
VIII. CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call