SeqVAE: Sequence variational autoencoder with policy gradient

Ting Gao,Yidong Cui,Fanyu Ding

doi:10.1007/s10489-021-02374-7

Abstract

In the paper, we propose a variant of Variational Autoencoder (VAE) for sequence generation task, called SeqVAE, which is a combination of recurrent VAE and policy gradient in reinforcement learning. The goal of SeqVAE is to reduce the deviation of the optimization goal of VAE, which we achieved by adding the policy-gradient loss to SeqVAE. In the paper, we give two ways to calculate the policy-gradient loss, one is from SeqGAN and the other is proposed by us. In the experiments on them, our proposed method is better than all baselines, and experiments show that SeqVAE can alleviate the “post-collapse” problem. Essentially, SeqVAE can be regarded as a combination of VAE and Generative Adversarial Net (GAN) and has better learning ability than the plain VAE because of the increased adversarial process. Finally, an application of our SeqVAE to music melody generation is available online12.

Full Text