Robustness of spectro-temporal features against intrinsic and extrinsic variations in automatic speech recognition

Bernd T Meyer,Birger Kollmeier

doi:10.1016/j.specom.2010.07.002

Abstract

The effect of bio-inspired spectro-temporal processing for automatic speech recognition (ASR) is analyzed for two different tasks with focus on the robustness of spectro-temporal Gabor features in comparison to mel-frequency cepstral coefficients (MFCCs). Experiments aiming at extrinsic factors such as additive noise and changes of the transmission channel were carried out on a digit classification task (AURORA 2) for which spectro-temporal features were found to be more robust than the MFCC baseline against a wide range of noise sources. Intrinsic variations, i.e., changes in speaking rate, speaking effort and pitch, were analyzed on a phoneme recognition task with matched training and test conditions. The sensitivity of Gabor and MFCC features against various speaking styles was found to be different in a systematic way. An analysis based on phoneme confusions for both feature types suggests that spectro-temporal and purely spectral features carry complementary information. The usefulness of the combined information was demonstrated in a system using a combination of both types of features which yields a decrease in word-error rate of 16% compared to the best single-stream recognizer and 47% compared to an MFCC baseline.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Robustness of spectro-temporal features against intrinsic and extrinsic variations in automatic speech recognition

Abstract

Talk to us

Similar Papers

More From: Speech Communication

Lead the way for us

Journal: Speech Communication	Publication Date: Jul 24, 2010
Citations: 69

Similar Papers

Novel Gammatone Filterbank Based Spectro-Temporal Features for Robust Phoneme Recognition
Ankit Nagpal ... Hemant A Patil
-
Ankit Nagpal, et. al.Ankit Nagpal ... Hemant A Patil
01 Jan 2017
01 Jan 2017

Complementarity of MFCC, PLP and Gabor features in the presence of speech-intrinsic variabilities
Bernd T Meyer ... Birger Kollmeier
-
Bernd T Meyer, et. al.Bernd T Meyer ... Birger Kollmeier
06 Sep 2009
06 Sep 2009

Subband feature extraction using lapped orthogonal transform for speech recognition
Z Tufekci ... J.N Gowdy
-
Z Tufekci, et. al.Z Tufekci ... J.N Gowdy
07 May 2001
07 May 2001

Spectro-temporal modulation subspace-spanning filter bank features for robust automatic speech recognition
Marc René Schädler ... Bernd T Meyer
The Journal of the Acoustical Society of America | VOL. 131
Marc René Schädler, et. al.Marc René Schädler ... Bernd T Meyer
01 May 2012
The Journal of the Acoustical Society of America | VOL. 131

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Robustness of spectro-temporal features against intrinsic and extrinsic variations in automatic speech recognition

Abstract

Talk to us

Similar Papers

More From: Speech Communication