An Ensemble Speaker and Speaking Environment Modeling Approach to Robust Speech Recognition

Yu Tsao,Chin-Hui Lee

doi:10.1109/tasl.2009.2016231

Abstract

We propose an ensemble speaker and speaking environment modeling (ESSEM) approach to characterizing environments in order to enhance performance robustness of automatic speech recognition systems under adverse conditions. The ESSEM process comprises two phases, the offline and the online. In the offline phase, we prepare an ensemble speaker and speaking environment space formed by a collection of super-vectors. Each super-vector consists of the entire set of means from all the Gaussian mixture components of a set of hidden Markov models that characterizes a particular environment. In the online phase, with the ensemble environment space prepared in the offline phase, we estimate the super-vector for a new testing environment based on a stochastic matching criterion. In this paper, we focus on methods for enhancing the construction and coverage of the environment space in the offline phase. We first demonstrate environment clustering and partitioning algorithms to structure the environment space well; then, we propose a minimum classification error training algorithm to enhance discrimination across environment super-vectors and therefore broaden the coverage of the ensemble environment space. We evaluate the proposed ESSEM framework on the Aurora2 connected digit recognition task. Experimental results verify that ESSEM provides clear improvement over a baseline system without environmental compensation. Moreover, the performance of ESSEM can be further enhanced by using well-structured environment spaces. Finally, we confirm that ESSEM gives the best overall performance with an environment space refined by an integration of all techniques.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An Ensemble Speaker and Speaking Environment Modeling Approach to Robust Speech Recognition

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Audio, Speech, and Language Processing

Lead the way for us

Journal: IEEE Transactions on Audio, Speech, and Language Processing	Publication Date: Jul 1, 2009
Citations: 83

Similar Papers

Soft margin estimation on improving environment structures for ensemble speaker and speaking environment modeling
Yu Tsao ... Satoshi Nakamura
-
Yu Tsao, et. al.Yu Tsao ... Satoshi Nakamura
03 Dec 2009
03 Dec 2009

A General Approximation-Optimization Approach to Large Margin Estimation of HMMs
Hui Jiang ... Xinwei Li
-
Hui Jiang, et. al.Hui Jiang ... Xinwei Li
01 Jun 2007
01 Jun 2007

Compensation of SNR and noise type mismatch using an environmental sniffing based speech recognition solution
Yongjoo Chung ... John Hl Hansen
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2013
Yongjoo Chung, et. al.Yongjoo Chung ... John Hl Hansen
20 Jun 2013
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2013

Multi-environment model adaptation based on vector Taylor series for robust speech recognition
Yong Lü ... Zhenyang Wu
Pattern Recognition | VOL. 43
Yong Lü, et. al.Yong Lü ... Zhenyang Wu
31 Mar 2010
Pattern Recognition | VOL. 43

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Ensemble Speaker and Speaking Environment Modeling Approach to Robust Speech Recognition

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Audio, Speech, and Language Processing