Soft margin estimation on improving environment structures for ensemble speaker and speaking environment modeling

Yu Tsao,Chin-Hui Lee,Jinyu Li,Satoshi Nakamura

doi:10.1145/1667780.1667863

Abstract

Recently, we proposed an ensemble speaker and speaking environment modeling (ESSEM) approach to enhance the robustness of automatic speech recognition (ASR) under adverse conditions. The ESSEM framework comprises two phases, offline and online phases. In the offline phase, we prepare an environment structure that is formed by multiple sets of hidden Markov models (HMMs). Each HMM set represents a particular speaker and speaking environment. In the online phase, ESSEM estimates a mapping function to transform the prepared environment structure to a set of HMMs for the unknown testing condition. In this study, we incorporate the soft margin estimation (SME) to increase the discriminative power of the environment structure in the offline stage and therefore enhance the overall ESSEM performance. We evaluated the performance on the Aurora-2 connected digit database. With the SME refined environment structure, ESSEM provides better performance than the original framework. By using our best online mapping function, ESSEM achieves a word error rate (WER) of 4.62%, corresponding to 14.60% relative WER reduction (from 5.41% to 4.62%) over the best baseline performance of 5.41% WER.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Soft margin estimation on improving environment structures for ensemble speaker and speaking environment modeling

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Compensation of SNR and noise type mismatch using an environmental sniffing based speech recognition solution
Yongjoo Chung ... John Hl Hansen
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2013
Yongjoo Chung, et. al.Yongjoo Chung ... John Hl Hansen
20 Jun 2013
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2013

Approximate Test Risk Bound Minimization Through Soft Margin Estimation
Jinyu Li ... Chin-Hui Lee
IEEE Transactions on Audio, Speech and Language Processing | VOL. 15
Jinyu Li, et. al.Jinyu Li ... Chin-Hui Lee
01 Jan 2007
IEEE Transactions on Audio, Speech and Language Processing | VOL. 15

Subband Temporal Envelope Features and Data Augmentation for End-to-end Recognition of Distant Conversational Speech
Cong-Thanh Do
-
Cong-Thanh DoCong-Thanh Do
01 May 2019
01 May 2019

Combining hybrid DNN-HMM ASR systems with attention-based models using lattice rescoring
Qiujia Li ... Philip C Woodland
Speech Communication | VOL. 147
Qiujia Li, et. al.Qiujia Li ... Philip C Woodland
24 Dec 2022
Speech Communication | VOL. 147

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Soft margin estimation on improving environment structures for ensemble speaker and speaking environment modeling

Abstract

Talk to us

Similar Papers