DLGNet: A Transformer-based Model for Dialogue Response Generation

Olabiyi Oluwatobi,Erik Mueller

doi:10.18653/v1/2020.nlp4convai-1.7

Abstract

Neural dialogue models, despite their successes, still suffer from lack of relevance, diversity, and in many cases coherence in their generated responses. These issues can attributed to reasons including (1) short-range model architectures that capture limited temporal dependencies, (2) limitations of the maximum likelihood training objective, (3) the concave entropy profile of dialogue datasets resulting in short and generic responses, and (4) the out-of-vocabulary problem leading to generation of a large number of <UNK> tokens. On the other hand, transformer-based models such as GPT-2 have demonstrated an excellent ability to capture long-range structures in language modeling tasks. In this paper, we present DLGNet, a transformer-based model for dialogue modeling. We specifically examine the use of DLGNet for multi-turn dialogue response generation. In our experiments, we evaluate DLGNet on the open-domain Movie Triples dataset and the closed-domain Ubuntu Dialogue dataset. DLGNet models, although trained with only the maximum likelihood objective, achieve significant improvements over state-of-the-art multi-turn dialogue models. They also produce best performance to date on the two datasets based on several metrics, including BLEU, ROUGE, and distinct n-gram. Our analysis shows that the performance improvement is mostly due to the combination of (1) the long-range transformer architecture with (2) the injection of random informative paddings. Other contributing factors include the joint modeling of dialogue context and response, and the 100% tokenization coverage from the byte pair encoding (BPE).

Highlights

Recent successes of pretrained transformer-based language models, such as BERT (Devlin et al, 2019), GPT(-2) (Radford and Salimans, 2018; Radford et al, 2019), Transformer-XL (Dai et al, 2019), XLNet (Yang et al, 2019), and ERNIE(2.0) (Sun et al, 2019a,b), have led to state-of-the-art performance on many natural language understanding (NLU) tasks including sentence classification, named entity recognition, sentence similarity, and question answering
The transformer-based DLGNet provides a significant improvement in response generation performance over existing methods such as (V)HRED, hredGAN, DAIM, and adversarial bootstrapping, all of which are based on recurrent neural networks
DLGNet achieves the best performance to date on the Movie triples and Ubuntu dialogue datasets in terms of BLEU, ROUGE, and distinct n-gram scores

Summary

Introduction

Recent successes of pretrained transformer-based language models, such as BERT (Devlin et al, 2019), GPT(-2) (Radford and Salimans, 2018; Radford et al, 2019), Transformer-XL (Dai et al, 2019), XLNet (Yang et al, 2019), and ERNIE(2.0) (Sun et al, 2019a,b), have led to state-of-the-art performance on many natural language understanding (NLU) tasks including sentence classification, named entity recognition, sentence similarity, and question answering. The exceptional performance of transformer-based language models is due to their ability to capture long-term temporal dependencies in the input sequence. This attribute should be very beneficial to dialogue modeling, especially in multi-turn scenarios. Most of the existing neural dialogue response generation models are based on recurrent neural networks (Sutskever et al, 2014; Vinyals and Le, 2015; Li et al, 2016a; Serban et al, 2016; Xing et al, 2017; Serban et al, 2017b,a; Li et al, 2016b; Zhang et al, 2018a; Olabiyi et al, 2018, 2019a). Previous work points to some causes of these limitations: Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI, pages 54–62 July 9, 2020. c 2020 Association for Computational Linguistics i) Training data: The presence of high frequency generic utterances (utterance-level semantic redundancy), such as “I don’t know”, “I’m not sure”, and high frequency generic n-gram tokens (wordlevel syntactic redundancy), such as “I”, “I am”, leading to the concave positional entropy profile of dialogue datasets, see Fig. 1), which makes learning difficult, resulting in short and generic responses. ii) Short-range Model Architecture: Short-range model architectures that capture limited temporal dependencies. iii) Out-of-vocabulary Problem: Less frequent (usually more informative) words mapped to the out-of-vocabulary token , leading to generation of a large number of tokens. iv) Exposure Bias: The discrepancy in model behavior between training and inference, which limits the informativeness of the responses iv) Training Objective: The limitations of the maximum likelihood training objective

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

DLGNet: A Transformer-based Model for Dialogue Response Generation

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2020
Citations: 22	License type: cc-by

Similar Papers

BPE-Dropout: Simple and Effective Subword Regularization
Ivan Provilkov ... Dmitrii Emelianenko
-
Ivan Provilkov, et. al.Ivan Provilkov ... Dmitrii Emelianenko
01 Jan 2020
01 Jan 2020

Enhancing Self-disclosure In Open-Domain Dialogue By Candidate Re-ranking
Mayank Soni ... Benjamin R Cowan
-
Mayank Soni, et. al.Mayank Soni ... Benjamin R Cowan
01 Jan 2021
01 Jan 2021

Controlling byte pair encoding for neural machine translation
Alfred John Tacorda ... Marvin John Ignacio
-
Alfred John Tacorda, et. al.Alfred John Tacorda ... Marvin John Ignacio
01 Dec 2017
01 Dec 2017

HybriDialogue: A Conversational AI Benchmark Grounded on Tabular and Textual Dana

-

27 May 2022
27 May 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DLGNet: A Transformer-based Model for Dialogue Response Generation

Abstract

Highlights

Summary

Talk to us

Similar Papers