Abstract

Miscommunication in spoken human–computer interaction is unavoidable. Ironically, the user's attempts to repair these miscommunications are even more likely to result in recognition failures, leading to frustrating error “spirals”. In this paper we investigate users' adaptations to recognition errors made by a spoken language system and the impact of these adaptations on models for speech recognition. In analyzing over 300 pairs of original and repeat correction utterances, matched on speaker and lexical content, we found overall increases in utterance and pause duration from original to correction. Here we focus on those adaptations – phonological and durational – that are most likely to adversely impact the accuracy of speech recognizers. We identify several phonological shifts from conversational to clear speech style. We determine that the observed durations of spoken user corrections from a field trial represent increases over, and divergences from, those derived from a speech recognizer's underlying model. Furthermore, words in final position diverge significantly more than those in non-final position, due to the additional effects of phrase-final lengthening. These systematic changes argue for a general model of pronunciation and duration, extending beyond the sentence level to incorporate higher-level dialog features, and illustrate important features for such a model to capture.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call