From prepared speech to spontaneous speech recognition system

Richard Dufour

doi:10.1145/1456223.1456345

Abstract

Automatic speech recognition systems (ASR) have more trouble processing spontaneous speech (e.g. debates) than prepared speech (e.g. broadcast news). These difficulties are due to peculiarities of spontaneous speech (false start, repetition, schwa, etc.). In this paper, we highlight some of these peculiarities, especially in French.We show that the use of manual transcriptions having no link with the focused application, but which contains only transcriptions of very spontaneous speech, allows to estimate a better language model, strongly decreasing perplexity and significantly decreasing the word error rate on spontaneous speech.But other knowledge bases used by the ASR have to be adapted. For example, our work shows that adding specific pronunciation variants seems useful, but has to be constrained and modelized. Finally, we compare errors of our CMU Sphinx-based ASR system on spontaneous vs. prepared speech.

Full Text