Abstract

In this paper, we propose Minimalist Transfer Learning (MinTL) to simplify the system design process of task-oriented dialogue systems and alleviate the over-dependency on annotated data. MinTL is a simple yet effective transfer learning framework, which allows us to plug-and-play pre-trained seq2seq models, and jointly learn dialogue state tracking and dialogue response generation. Unlike previous approaches, which use a copy mechanism to “carryover” the old dialogue states to the new one, we introduce Levenshtein belief spans (Lev), that allows efficient dialogue state tracking with a minimal generation length. We instantiate our learning framework with two pre-trained backbones: T5 and BART, and evaluate them on MultiWOZ. Extensive experiments demonstrate that: 1) our systems establish new state-of-the-art results on end-to-end response generation, 2) MinTL-based systems are more robust than baseline methods in the low resource setting, and they achieve competitive results with only 20% training data, and 3) Lev greatly improves the inference efficiency.

Highlights

  • Building robust task-oriented dialogue systems is challenging due to complex system design and limited availability of human-annotated data (Wen et al, 2017; Wu et al, 2019b)

  • Fine tuning pre-trained language models improves a wide range of natural language processing applications (Lewis et al, 2019; Raffel et al, 2019), notably machine translation (Conneau and Lample, 2019), and personalized dialogue response generation (Wolf et al, 2019b)

  • We propose Minimalist Transfer Learning (MinTL), a simple yet effective transfer learning framework that allows to plug-and-play pre-trained sequence-to-sequence (Seq2Seq) models and jointly learn dialogue state tracking (DST) and dialogue response generation

Read more

Summary

Introduction

Building robust task-oriented dialogue systems is challenging due to complex system design and limited availability of human-annotated data (Wen et al, 2017; Wu et al, 2019b). Recent progress in pre-training language models has been shown to be promising in alleviating the data scarcity problem (Budzianowski and Vulic, 2019; Wu et al, 2020). Current state-of-the-art (SOTA) approaches in task-oriented dialogue rely on several tasks-specific modules, such as State Operation Predictor (Kim et al, 2019) for dialogue state tracking, and CopyNet (Gu et al, 2016) for end-to-end dialogue task completion (Lei et al, 2018; Zhang et al, 2019b). Such modules are usually absent in the pre-training stage. Tasks-specific architecture modifications are required in order to adapt pre-trained language models to different dialogue tasks

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call