Imitation Learning for Sentence Generation with Dilated Convolutions Using Adversarial Training

Jian-Wei Peng,Chuan-Wang Chang,Min-Chun Hu

doi:10.1109/icmew.2019.00081

Abstract

In this work, we consider the sentence generation problem as an imitation learning problem, which aims to learn a policy to mimic the expert. Recent works have showed that adversarial learning can be applied to imitation learning problems. However, it has been indicated that the reward signal from the discriminator is not robust in reinforcement learning (RL) based generative adversarial network (GAN), and estimating state-action value is usually computationally intractable. To deal with this problem, we propose to use two discriminators to provide two different reward signals for constructing a more general imitation learning framework that can be used for sequence generation. Monte Carlo (MC) rollout is therefore not necessary to make our algorithm computationally tractable for generating long sequences. Furthermore, our policy and discriminator networks are integrated by sharing another state encoder network constructed based on dilated convolutions instead of recurrent neural networks (RNNs). In our experiment, we show that the two reward signals control the trade-off between the quality and the diversity of the output sequences.

Full Text