Abstract

In a spoken dialog system, dialog state tracking refers to the task of correctly inferring the state of the conversation -- such as the user's goal -- given all of the dialog history up to that turn. Dialog state tracking is crucial to the success of a dialog system, yet until recently there were no common resources, hampering progress. The Dialog State Tracking Challenge series of 3 tasks introduced the first shared testbed and evaluation metrics for dialog state tracking, and has underpinned three key advances in dialog state tracking: the move from generative to discriminative models; the adoption of discriminative sequential techniques; and the incorporation of the speech recognition results directly into the dialog state tracker. This paper reviews this research area, covering both the challenge tasks themselves and summarizing the work they have enabled.

Highlights

  • Conversational systems are increasingly becoming a part of daily life, with examples including Apple’s Siri, Google Nuance Dragon Go, Xbox and Cortana from Microsoft, and numerous start-ups

  • This spoken language understanding (SLU) result is passed to the dialog state tracker (DST) which updates its estimate of the dialog state

  • This tracker considers the items on the SLU N -best list – it is an “oracle” in the sense that, if a slot/value pair appears that corresponds to the user’s goal, it is added to the state with confidence 1.0

Read more

Summary

Introduction

Conversational systems are increasingly becoming a part of daily life, with examples including Apple’s Siri, Google Nuance Dragon Go, Xbox and Cortana from Microsoft, and numerous start-ups. The user produces an utterance as audio. Automatic speech recognition (ASR) converts this audio into words in text form. The words in an utterance are converted to a meaning representation using spoken language understanding (SLU). This SLU result is passed to the dialog state tracker (DST) which updates its estimate of the dialog state. This new dialog state is passed to the dialog policy that decides which action to take. Natural language generation (NLG) and text-tospeech (TTS) convert this action into words and into audio.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call