Abstract

While broadly applicable to many natural language processing (NLP) tasks, variational autoencoders (VAEs) are hard to train due to the posterior collapse issue where the latent variable fails to encode the input data effectively. Various approaches have been proposed to alleviate this problem to improve the capability of the VAE. In this paper, we propose to introduce a mutual information (MI) term between the input and its latent variable to regularize the objective of the VAE. Since estimating the MI in the high-dimensional space is intractable, we employ neural networks for the estimation of the MI and provide a training algorithm based on the convex duality approach. Our experimental results on three benchmark datasets demonstrate that the proposed model, compared to the state-of-the-art baselines, exhibits less posterior collapse and has comparable or better performance in language modeling and text generation. We also qualitatively evaluate the inferred latent space and show that the proposed model can generate more reasonable and diverse sentences via linear interpolation in the latent space.

Highlights

  • Deep learning architectures are parameterized by families of non-linear functions, which learn multiple levels of more abstract representations (Bengio, 2009; Bengio et al, 2013)

  • We report negative log-likelihood (NLL), KL divergence (KL), perplexity (PPL), mutual information (MI), the number of active units (AU), forward perplexity (FPPL) and reverse perplexity (RPPL)

  • With the mutual information term considered for the optimization, we believe that variational autoencoders (VAEs)-MINE allows more reasonable correspondence patterns between the input and its inferred latent variable so as to better alleviate posterior collapse

Read more

Summary

Introduction

Deep learning architectures are parameterized by families of non-linear functions, which learn multiple levels of more abstract representations (Bengio, 2009; Bengio et al, 2013). The goal is to learn a compact representation to capture the salient structure in a given highly complex high-dimensional unlabelled data so that new data with some variations can be generated. They have been widely applied to a range of NLP tasks, such as language modeling (Bowman et al, 2016; Zhao et al, 2018a), dialog generation (Zhao et al, 2017, 2018b), etc. We focus on the VAE with recurrent neural networks as its encoder and decoder for text generation. The one-stepahead predictions force RNNs to learn local correlations, rather than global coherence This is insufficient to capture high-level abstractions which characterize text sequences. The prior p(z) is assumed a standard Gaussian distribution N (0, I)

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.