Abstract

The underlying difference of linguistic patterns between general text and task-oriented dialogue makes existing pre-trained language models less useful in practice. In this work, we unify nine human-human and multi-turn task-oriented dialogue datasets for language modeling. To better model dialogue behavior during pre-training, we incorporate user and system tokens into the masked language modeling. We propose a contrastive objective function to simulate the response selection task. Our pre-trained task-oriented dialogue BERT (TOD-BERT) outperforms strong baselines like BERT on four downstream task-oriented dialogue applications, including intention recognition, dialogue state tracking, dialogue act prediction, and response selection. We also show that TOD-BERT has a stronger few-shot ability that can mitigate the data scarcity problem for task-oriented dialogue.

Highlights

  • Pre-trained models with self-attention encoder architectures (Devlin et al, 2018; Liu et al, 2019) have been commonly used in many NLP applications

  • By further fine-tuning these representations, breakthroughs have been continuously reported for various downstream tasks, especially natural language understanding

  • We first conduct the experiments using the whole dataset, and we simulate the few-shot setting to show the strength of our TOD-BERT

Read more

Summary

Introduction

Pre-trained models with self-attention encoder architectures (Devlin et al, 2018; Liu et al, 2019) have been commonly used in many NLP applications Such models are self-supervised based on a massive scale of general text corpora, such as English Wikipedia or books (Zhu et al, 2015). Pre-training dialogue language models using chit-chat corpora from social media, such as Twitter or Reddit, has been recently investigated, especially for dialogue response generation (Zhang et al, 2019) and retrieval (Henderson et al, 2019b). These opendomain dialogues are diverse and easy-to-get, they are usually short, noisy, and without specific chatting goals

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call