Decoding speech in the presence of other sources

J.P Barker,M.P Cooke,D.P.W Ellis

doi:10.1016/j.specom.2004.05.002

Abstract

The statistical theory of speech recognition introduced several decades ago has brought about low word error rates for clean speech. However, it has been less successful in noisy conditions. Since extraneous acoustic sources are present in virtually all everyday speech communication conditions, the failure of the speech recognition model to take noise into account is perhaps the most serious obstacle to the application of ASR technology. Approaches to noise-robust speech recognition have traditionally taken one of two forms. One set of techniques attempts to estimate the noise and remove its effects from the target speech. While noise estimation can work in low-to-moderate levels of slowly varying noise, it fails completely in louder or more variable conditions. A second approach utilises noise models and attempts to decode speech taking into account their presence. Again, model-based techniques can work for simple noises, but they are computationally complex under realistic conditions and require models for all sources present in the signal. In this paper, we propose a statistical theory of speech recognition in the presence of other acoustic sources. Unlike earlier model-based approaches, our framework makes no assumptions about the noise background, although it can exploit such information if it is available. It does not require models for background sources, or an estimate of their number. The new approach extends statistical ASR by introducing a segregation model in addition to the conventional acoustic and language models. While the conventional statistical ASR problem is to find the most likely sequence of speech models which generated a given observation sequence, the new approach additionally determines the most likely set of signal fragments which make up the speech signal. Although the framework is completely general, we provide one interpretation of the segregation model based on missing-data theory. We derive an efficient HMM decoder, which searches both across subword state and across alternative segregations of the signal between target and interference. We call this modified system the speech fragment decoder. The value of the speech fragment decoder approach has been verified through experiments on small-vocabulary tasks in high-noise conditions. For instance, in a noise-corrupted connected digit task, the new approach decreases the word error rate in the condition of factory noise at 5 dB SNR from over 59% for a standard ASR system to less than 22%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Decoding speech in the presence of other sources

Abstract

Talk to us

Similar Papers

More From: Speech Communication

Lead the way for us

Journal: Speech Communication	Publication Date: Sep 11, 2004
Citations: 169

Similar Papers

Transliteration Based Approaches to Improve Code-Switched Speech Recognition Performance
Jesse Emond ... Min Ma
-
Jesse Emond, et. al.Jesse Emond ... Min Ma
01 Dec 2018
01 Dec 2018

Building Acoustic and Language Model for Continuous Speech Recognition in Bahasa Indonesia
Andreas Widjaja ... Vincent Elbert Budiman
Jurnal Teknik Informatika dan Sistem Informasi | VOL. 6
Andreas Widjaja, et. al.Andreas Widjaja ... Vincent Elbert Budiman
10 Aug 2020
Jurnal Teknik Informatika dan Sistem Informasi | VOL. 6

Future vector enhanced LSTM language model for LVCSR
Qi Liu ... Yanmin Qian
-
Qi Liu, et. al.Qi Liu ... Yanmin Qian
01 Dec 2017
01 Dec 2017

Articulatory motivated acoustic features for speech recognition
Daniil Kocharov ... Ralf Schlüter
-
Daniil Kocharov, et. al.Daniil Kocharov ... Ralf Schlüter
04 Sep 2005
04 Sep 2005

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Decoding speech in the presence of other sources

Abstract

Talk to us

Similar Papers

More From: Speech Communication