Abstract

Sequential variational autoencoders (VAEs) with a global latent variable z have been studied for disentangling the global features of data, which is useful for several downstream tasks. To further assist the sequential VAEs in obtaining meaningful z, existing approaches introduce a regularization term that maximizes the mutual information (MI) between the observation and z. However, by analyzing the sequential VAEs from the information-theoretic perspective, we claim that simply maximizing the MI encourages the latent variable to have redundant information, thereby preventing the disentanglement of global features. Based on this analysis, we derive a novel regularization method that makes z informative while encouraging disentanglement. Specifically, the proposed method removes redundant information by minimizing the MI between z and the local features by using adversarial training. In the experiments, we trained two sequential VAEs, state-space and autoregressive model variants, using speech and image datasets. The results indicate that the proposed method improves the performance of downstream classification and data generation tasks, thereby supporting our information-theoretic perspective for the learning of global features.

Highlights

  • Uncovering the global factors of variation from high-dimensional data is a significant and relevant problem in representation learning (Bengio et al 2013)

  • Sequential variational autoencoders (VAEs) with a global latent variable z play an important role in the unsupervised learning of global features

  • A typical issue is that the latent variable z is ignored by a decoder (SSMs or autoregressive models (ARMs)) and becomes uninformative, which is referred to as posterior collapse (PC). This phenomenon occurs as follows: with expressive decoders, such as state-space models (SSMs) or ARMs, the additional latent variable z cannot assist in improving the evidence lower bound (ELBO), which is the objective function of VAEs; the decoders will not use z (Chen et al 2017; Alemi et al 2018)

Read more

Summary

Introduction

Uncovering the global factors of variation from high-dimensional data is a significant and relevant problem in representation learning (Bengio et al 2013). This phenomenon occurs as follows: with expressive decoders, such as SSMs or ARMs, the additional latent variable z cannot assist in improving the evidence lower bound (ELBO), which is the objective function of VAEs; the decoders will not use z (Chen et al 2017; Alemi et al 2018) To alleviate this issue, existing approaches regularize the mutual information (MI) between x and z to be large by using -VAE (Alemi et al 2018) or adversarial training (Makhzani and Frey 2017), for example. We evaluated the ability of controlled generation using a novel evaluation method inspired by Ravuri and Vinyals (2019), and confirmed that CMImaximizing regularization consistently outperformed MI-maximizing regularization These results support (1) our information-theoretic view of learning global features: the sequential VAEs can suffer from obtaining redundant features when merely maximizing the MI. I(x; z) and I(z; s) are shown to work complementarily in our experiments using two models and two domains (speech and image datasets), indicating that it would help improve various sequential VAEs proposed previously

Sequential VAEs for learning global representations
State space model with global latent variable
Autoregressive model with global latent variable
Mutual information‐maximizing regularization for sequential VAEs
Problem in MI‐maximizing regularization
Conditional mutual information‐maximizing regularization
Estimation method of the regularization term
Objective function for DSAEs and PixelCNN‐VAEs
Related works
Settings
Speaker verification experiment with disentangled sequential autoencoders
Unsupervised learning for image classification
Controlled generation
Findings
Discussions and future works
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call