Abstract

The Variational Autoencoder (VAE) is a popular and powerful model applied to text modelling to generate diverse sentences. However, an issue known as posterior collapse (or KL loss vanishing) happens when the VAE is used in text modelling, where the approximate posterior collapses to the prior, and the model will totally ignore the latent variables and be degraded to a plain language model during text generation. Such an issue is particularly prevalent when RNN-based VAE models are employed for text modelling. In this paper, we propose a simple, generic architecture called Timestep-Wise Regularisation VAE (TWR-VAE), which can effectively avoid posterior collapse and can be applied to any RNN-based VAE models. The effectiveness and versatility of our model are demonstrated in different tasks, including language modelling and dialogue response generation.

Highlights

  • Variational Autoencoders (VAE) (Kingma and Welling, 2014; Rezende et al, 2014), together with other deep generative models, including Generative Adversarial Networks (Goodfellow et al, 2014) and autoregressive models (Oord et al, 2018), have attracted a mass of attention in the research community as they have shown their ability to learn compact representations from complex, high-dimensional unlabelled data

  • We evaluate our Timestep-Wise Regularisation VAE (TWR-VAE) model on three public benchmark datasets, namely, Penn Treebank (PTB) (Marcus and Marcinkiewicz, 1993), Yelp15 (Yang et al, 2017), and Yahoo (Zhang et al, 2015), which have been widely used in previous work for text modelling (Bowman et al, 2016; Kim et al, 2018; Fu et al, 2019; He et al, 2019; Zhu et al, 2020)

  • We report the performance on four metrics: negative log likelihood (NLL), perplexity (PPL), KLdivergence which measures the distance between two probability distributions, and the mutual information of the input x and the latent variable z, which measures how much information of x is obtained by z

Read more

Summary

Introduction

Variational Autoencoders (VAE) (Kingma and Welling, 2014; Rezende et al, 2014), together with other deep generative models, including Generative Adversarial Networks (Goodfellow et al, 2014) and autoregressive models (Oord et al, 2018), have attracted a mass of attention in the research community as they have shown their ability to learn compact representations from complex, high-dimensional unlabelled data. TWR-VAE shares some similarity with existing VAE-RNN models, where the input to the decoder is the latent variable sample from the variational posterior at the final timestep of the encoder. While this is a reasonable design choice, we explore two model variants of TWR-VAE, namely, TWR-VAEmean and TWR-VAEsum. The contribution of our paper are three-fold: (1) we propose a simple and robust method, which can effectively alleviate the posterior collapse issue of VAE via timestep-wise regularisation; (2) our approach is generic which can be applied to any RNN-based VAE models; (3) our approach outperforms the state-of-art on language modelling and yields better or comparable performance on dialogue response generation. The code of TWR-VAE is available at: https://github.com/ruizheliUOA/TWR-VAE

Related Work
Background of VAE
TWR-VAEmean and TWR-VAEsum
Language Modelling
Dialogue Response Generation
Conclusion
B The reparameterisation trick for our timestep-wise latent variables
D Training Details for Language Modelling
F Training Details for Dialogue Response Generation
G Examples of the latent representation interpolation on the Yahoo test dataset
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.