Toward incremental dialogue act segmentation in fast-paced interactive dialogue systems

Ramesh Manuvinakurike,David Devault,Maike Paetzel,David Schlangen,Cheng Qu

doi:10.18653/v1/w16-3632

Abstract

In this paper, we present and evaluate an approach to incremental dialogue act (DA) segmentation and classification. Our approach utilizes prosodic, lexico-syntactic and contextual features, and achieves an encouraging level of performance in offline corpus-based evaluation as well as in simulated human-agent dialogues. Our approach uses a pipeline of sequential processing steps, and we investigate the contribution of different processing steps to DA segmentation errors. We present our results using both existing and new metrics for DA segmentation. The incremental DA segmentation capability described here may help future systems to allow more natural speech from users and enable more natural patterns of interaction.

Highlights

In this paper we explore the feasibility of incorporating an incremental dialogue act segmentation capability into an implemented, high-performance spoken dialogue agent that plays a time-constrained image-matching game with its users (Paetzel et al, 2015)
These conditions allow us to better understand the sources for observed errors in segment boundaries and/or dialogue act (DA) labels. Our notation for these conditions is a compact encoding of the data sources used to create the transcripts of user speech, the segment boundaries, and the DA labels
In a manual analysis of common error types, we found that the different DA labels used for very short utterances like ‘okay’ (D-M, Positive Feedback (PFB), Assert Identified (As-I)) and ‘yeah’ (A-Y, PFB, As-I) are often confused

Summary

Introduction

In this paper we explore the feasibility of incorporating an incremental dialogue act segmentation capability into an implemented, high-performance spoken dialogue agent that plays a time-constrained image-matching game with its users (Paetzel et al, 2015). It’s important to allow users to speak naturally to spoken dialogue systems. It has been understood for some time that this requires a system to be able to automatically segment a user’s speech into meaningful units in real-time while they speak (Nakano et al, 1999). In many systems, it’s assumed that pauses in the user’s speech can be used to determine the segmentation, often by treating each detected pause as indicating a dialogue act (DA) boundary (Komatani et al, 2015)

Objectives

Results

Discussion

Conclusion