Balancing Reconstruction Error and Kullback-Leibler Divergence in Variational Autoencoders

Andrea Asperti,Matteo Trentin

doi:10.1109/access.2020.3034828

Andrea Asperti, Matteo Trentin

Open Access

https://doi.org/10.1109/access.2020.3034828

Copy DOI

Abstract

Likelihood-based generative frameworks are receiving increasing attention in the deep learning community, mostly on account of their strong probabilistic foundation. Among them, Variational Autoencoders (VAEs) are reputed for their fast and tractable sampling and relatively stable training, but if not properly tuned they may easily produce poor generative performances. The loss function of Variational Autoencoders is the sum of two components, with somehow contrasting effects: the reconstruction loss, improving the quality of the resulting images, and the Kullback-Leibler divergence, acting as a regularizer of the latent space. Correctly balancing these two components is a delicate issue, and one of the major problems of VAEs. Recent techniques address the problem by allowing the network to learn the balancing factor during training, according to a suitable loss function. In this article, we show that learning can be replaced by a simple deterministic computation, expressing the balancing factor in terms of a running average of the reconstruction error over the last minibatches. As a result, we keep a constant balance between the two components along training: as reconstruction improves, we proportionally decrease KL-divergence in order to prevent its prevalence, that would forbid further improvements of the quality of reconstructions. Our technique is simple and effective: it clarifies the learning objective for the balancing factor, and it produces faster and more accurate behaviours. On typical datasets such as Cifar10 and CelebA, our technique sensibly outperforms all previous VAE architectures with comparable parameter capacity.

Highlights

Generative models address the challenging task of capturing the probabilistic distribution of high-dimensional data, in order to gain insight in their characteristic manifold, and paving the way to the possibility of synthesizing new data samples.The main frameworks of generative models that have been investigated so far are Generative Adversarial Networks (GAN) [13] and Variational Autoencoders (VAE) [17], [21], both of which generated an enormous amount of works, addressing variants, theoretical investigations, or practical applications.The main feature of Variational Autoencoders is that they offer a strongly principled probabilistic approach to generative modeling
The loss function of VAEs is composed of two parts: one is just the log-likelihood of the reconstruction, while the second one is a term aimed to enforce a known prior distribution P (z) of the latent space - typically a spherical normal distribution
The reason why the balancing policy between reconstruction error and KL-regularization addressed in [9] and revisited in this article is so effective seems to rely on its laziness in the choice of the latent representation

Summary

INTRODUCTION

The loss function of VAEs is composed of two parts: one is just the log-likelihood of the reconstruction, while the second one is a term aimed to enforce a known prior distribution P (z) of the latent space - typically a spherical normal distribution This is achieved by minimizing the Kullbach-Leibler distance between Q(z|X) and the prior distribution P (z); as a side effect, this will improve the similarity of the aggregate inference distribution Q(z) = EX Q(z|Z) with the desired prior, that is our final objective. Several techniques have been considered for the correct calibration of γ, comprising an annealed optimization schedule [6] or a policy enforcing minimum KL contribution from subsets of latent units [16] Most of these schemes require hand-tuning and, quoting [26], they risk to “take away the principled regularization scheme that is built into VAE.”.

VARIATIONAL AUTOENCODERS

GENERATION OF NEW SAMPLES

EMPIRICAL EVALUATION

DISCUSSION