Speech production models in automatic speech recognition—Forming a lasting marriage between speech science and speech technology

R. C. Rose,J. Schroeter,M. M. Sondhi,O. Ghitza

doi:10.1121/1.409548

Abstract

At present, the performance of automatic speech recognition (ASR) systems is still limited by variabilities within and between speakers, by acoustic differences between training and application environments, and by the sensitivity of ASR systems against changing communication channels. This talk considers the conjecture that the use of speech-production models in ASR systems can contribute to making ASR systems more robust with respect to these sources of variability. Although it is well known that production-oriented representations of speech may be used to exploit the continuity of articulatory movements, several obstacles stand in the way of incorporating speech production models in recognizers. These include the difficult problem of acoustic-to-articulatory mapping, the hugh complexity of searching an articulatory space, and the lack of sensitive diagnostic performance metrics for evaluating strengths and weaknesses of a particular production model. Several research laboratories are actively involved in efforts towards incorporating articulatory models in various forms in working ASR systems. In addition to summarizing this work, mechanisms will be suggested for stimulating closer interaction between researchers in production, perception, and processing.

Full Text