Regularized minimum variance distortionless response-based cepstral features for robust continuous speech recognition

Md Jahangir Alam,Patrick Kenny,Douglas O’Shaughnessy

doi:10.1016/j.specom.2015.07.007

Abstract

In this paper, we present robust feature extractors that incorporate a regularized minimum variance distortionless response (RMVDR) spectrum estimator instead of the discrete Fourier transform-based direct spectrum estimator, used in many front-ends including the conventional MFCC, to estimate the speech power spectrum. Direct spectrum estimators, e.g., single tapered periodogram, have high variance and they perform poorly under noisy and adverse conditions. To reduce this performance drop we propose to increase the robustness of speech recognition systems by extracting features that are more robust based on the regularized MVDR technique. The RMVDR spectrum estimator has low spectral variance and is robust to mismatch conditions. Based on the RMVDR spectrum estimator, robust acoustic front-ends, namely, are regularized MVDR-based cepstral coefficients (RMCC), robust RMVDR cepstral coefficients (RRMCC) and normalized RMVDR cepstral coefficients (NRMCC). In addition to the RMVDR spectrum estimator, RRMCC and NRMCC also utilize auditory domain spectrum enhancement methods, auditory spectrum enhancement (ASE) and medium duration power bias subtraction (MDPBS) techniques, respectively, to improve the robustness of the feature extraction method. Experimental speech recognition results are conducted on the AURORA-4 large vocabulary continuous speech recognition corpus and performances are compared with the Mel frequency cepstral coefficients (MFCC), perceptual linear prediction (PLP), MVDR spectrum estimator-based MFCC, perceptual MVDR (PMVDR), cochlear filterbank cepstral coefficients (CFCC), power normalized cepstral coefficients (PNCC), ETSI advancement front-end (ETSI-AFE), and the robust feature extractor (RFE) of Alam et al. (2012). Experimental results demonstrate that the proposed robust feature extractors outperformed the other robust front-ends in terms of percentage word error rate on the AURORA-4 large vocabulary continuous speech recognition (LVCSR) task under clean and multi-condition training conditions. In clean training conditions, on average, the RRMCC and NRMCC provide significant reductions in word error rate over the rest of the front-ends. In multi-condition training, the RMCC, RRMCC, and NRMCC perform slightly better in terms of the average word error rate than the rest of the front-ends used in this work.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Regularized minimum variance distortionless response-based cepstral features for robust continuous speech recognition

Abstract

Talk to us

Similar Papers

More From: Speech Communication

Lead the way for us

Journal: Speech Communication	Publication Date: Jul 29, 2015
Citations: 7

Similar Papers

Robust Feature Extractors for Continuous Speech Recognition
...
-
, et. al. ...
13 Nov 2014
13 Nov 2014

Continuous Punjabi speech recognition model based on Kaldi ASR toolkit
Jyoti Guglani ... A N Mishra
International Journal of Speech Technology | VOL. 21
Jyoti Guglani, et. al.Jyoti Guglani ... A N Mishra
16 Feb 2018
International Journal of Speech Technology | VOL. 21

Regularized MVDR spectrum estimation-based robust feature extractors for speech recognition
Md Jahangir Alam ... Douglas O'Shaughnessy
-
Md Jahangir Alam, et. al.Md Jahangir Alam ... Douglas O'Shaughnessy
25 Aug 2013
25 Aug 2013

Correlative consideration concerning feature extraction techniques for speech recognition — A review
Arshpreet Kaur ... Virender Kadyan
-
Arshpreet Kaur, et. al.Arshpreet Kaur ... Virender Kadyan
01 Mar 2016
01 Mar 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Regularized minimum variance distortionless response-based cepstral features for robust continuous speech recognition

Abstract

Talk to us

Similar Papers

More From: Speech Communication