Evaluation of monaural and binaural speech enhancement for robust auditory-based automatic speech recognition

Michael Kleinschmidt,Birger Kollmeier,Thomas Wittkop

doi:10.1121/1.425340

Abstract

A major deficiency in state-of-the-art automatic speech recognition systems is the lack of robustness in additive and convolutive noise. The model of auditory perception, as developed by Dau et al. [J. Acoust. Soc. Am. 99, 3615–3622 (1996)] for psychoacoustical purposes, partly overcomes these difficulties when used as a front-end for speech recognition. Especially in combination with locally-recurrent neural networks (LRNN) the model output, called ‘‘internal representation’’ had been shown to provide highly robust feature vectors [Tchorz and Kollmeier, J. Acoust. Soc. Am. (submitted)]. To further improve the performance of this auditory-based LRNN recognition system in background noise, different speech enhancement methods were examined. The minimum mean-square error (MMSE) short-term spectral amplitude estimator (STSA), as proposed by Ephraim and Malah [IEEE Trans. Acoust., Speech, Signal Process. 32, 1109–1121 (1984)], was compared to a binaural Wiener filter [Wittkop et al., this meeting], based on directional and coherence cues. Both noise reduction algorithms yield highly improved recognition rates in nonreverberant noisy conditions, while the performance in clean speech is not significantly affected. The algorithms were also evaluated in real-world reverberant conditions with speech-simulating noise and jammer speech.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Evaluation of monaural and binaural speech enhancement for robust auditory-based automatic speech recognition

Abstract

Talk to us

Similar Papers

More From: The Journal of the Acoustical Society of America

Lead the way for us

Journal: The Journal of the Acoustical Society of America	Publication Date: Feb 1, 1999
Citations: 2

Similar Papers

Combining speech enhancement and auditory feature extraction for robust speech recognition
Michael Kleinschmidt ... Birger Kollmeier
Speech Communication | VOL. 34
Michael Kleinschmidt, et. al.Michael Kleinschmidt ... Birger Kollmeier
14 Feb 2001
Speech Communication | VOL. 34

Deep Learning for Minimum Mean-Square Error and Missing Data Approaches to Robust Speech Processing

-

04 Dec 2020
04 Dec 2020

Combined speech enhancement and auditory modelling for robust distributed speech recognition
Ronan Flynn ... Edward Jones
Speech Communication | VOL. 50
Ronan Flynn, et. al.Ronan Flynn ... Edward Jones
20 May 2008
Speech Communication | VOL. 50

A Cross-Entropy-Guided Measure (CEGM) for Assessing Speech Recognition Performance and Optimizing DNN-Based Speech Enhancement
Li Chai ... Jun Du
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 29
Li Chai, et. al.Li Chai ... Jun Du
12 Nov 2020
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Evaluation of monaural and binaural speech enhancement for robust auditory-based automatic speech recognition

Abstract

Talk to us

Similar Papers

More From: The Journal of the Acoustical Society of America