Neural Response Generation via GAN with an Approximate Embedding Layer

Zhen Xu,Chengjie Sun,Chao Qi,Zhuoran Wang,Bingquan Liu,Xiaolong Wang,Baoxun Wang

doi:10.18653/v1/d17-1065

Abstract

This paper presents a Generative Adversarial Network (GAN) to model single-turn short-text conversations, which trains a sequence-to-sequence (Seq2Seq) network for response generation simultaneously with a discriminative classifier that measures the differences between human-produced responses and machine-generated ones. In addition, the proposed method introduces an approximate embedding layer to solve the non-differentiable problem caused by the sampling-based output decoding procedure in the Seq2Seq generative model. The GAN setup provides an effective way to avoid noninformative responses (a.k.a “safe responses”), which are frequently observed in traditional neural response generators. The experimental results show that the proposed approach significantly outperforms existing neural response generation models in diversity metrics, with slight increases in relevance scores as well, when evaluated on both a Mandarin corpus and an English corpus.

Highlights

After achieving remarkable successes in Machine Translation (Sutskever et al, 2014; Cho et al, 2014), neural networks with the encoder-decoder architectures (a.k.a sequence-to-sequence models, Seq2Seq) have been proven to be a functioning method to model short-text conversations (Vinyals and Le, 2015; Shang et al, 2015), where the corresponding task is often called Neural Response Generation
Since a safe response can be of relevance to a large amount of diverse queries, a statistical learner will tend to minimize its empirical risk in the response generation process by capturing those safe responses if naıve relevance-oriented loss metrics are employed
We propose a novel variant of Generative Adversarial Network (GAN) for conversational response generation, which introduces an approximate embedding layer to replace the sampling-based decoding phase, such that the entire model is continuous and differentiable

Summary

Introduction

After achieving remarkable successes in Machine Translation (Sutskever et al, 2014; Cho et al, 2014), neural networks with the encoder-decoder architectures (a.k.a sequence-to-sequence models, Seq2Seq) have been proven to be a functioning method to model short-text conversations (Vinyals and Le, 2015; Shang et al, 2015), where the corresponding task is often called Neural Response Generation. Seq2Seq models to conversation generation is that the training procedure can be performed end-to-end in an unsupervised manner, based on human-generated conversational utterances (typically query-response pairs mined from social networks). Previous research has indicated that naıve implementations of Seq2Seq based conversation models tend to suffer from the so-called “safe response” problem (Li et al, 2016a), i.e. such models tend to generate non-informative responses that can be associated to most queries, e.g. “I don’t know”, “I think so”, etc This is due to the fundamental nature of statistical models, which fit sufficiently observed examples better than insufficiently observed ones. Since a safe response can be of relevance to a large amount of diverse queries, a statistical learner will tend to minimize its empirical risk in the response generation process by capturing those safe responses if naıve relevance-oriented loss metrics are employed

Objectives

Results

Conclusion