Large-margin minimum classification error training: A theoretical risk minimization perspective

Dong Yu,Li Deng,Xiaodong He,Alex Acero

doi:10.1016/j.csl.2008.03.002

Abstract

Large-margin discriminative training of hidden Markov models has received significant attention recently. A natural and interesting question is whether the existing discriminative training algorithms can be extended directly to embed the concept of margin. In this paper, we give this question an affirmative answer by showing that the sigmoid bias in the conventional minimum classification error (MCE) training can be interpreted as a soft margin. We justify this claim from a theoretical classification risk minimization perspective where the loss function associated with a non-zero sigmoid bias is shown to include not only empirical error rates but also a margin-bound risk. Based on this perspective, we propose a practical optimization strategy that adjusts the margin (sigmoid bias) incrementally in the MCE training process so that a desirable balance between the empirical error rates on the training set and the margin can be achieved. We call this modified MCE training process large-margin minimum classification error (LM-MCE) training to differentiate it from the conventional MCE. Speech recognition experiments have been carried out on two tasks. First, in the TIDIGITS recognition task, LM-MCE outperforms the state-of-the-art MCE method with 17% relative digit-error reduction and 19% relative string-error reduction. Second, on the Microsoft internal large vocabulary telephony speech recognition task (with 2000 h of training data and 120 K words in the vocabulary), significant recognition accuracy improvement is achieved, demonstrating that our formulation of LM-MCE can be successfully scaled up and applied to large-scale speech recognition tasks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Large-margin minimum classification error training: A theoretical risk minimization perspective

Abstract

Talk to us

Similar Papers

More From: Computer Speech & Language

Lead the way for us

Journal: Computer Speech & Language	Publication Date: Mar 12, 2008
Citations: 60

Similar Papers

I-smooth for improved minimum classification error training
Haozheng Li ... Cosmin Munteanu
-
Haozheng Li, et. al.Haozheng Li ... Cosmin Munteanu
01 Jan 2009
01 Jan 2009

Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error
Erik Mcdermott ... Shigeru Katagiri
IEEE Transactions on Audio, Speech and Language Processing | VOL. 15
Erik Mcdermott, et. al.Erik Mcdermott ... Shigeru Katagiri
01 Jan 2007
IEEE Transactions on Audio, Speech and Language Processing | VOL. 15

Minimum classification error training with geometric margin enhancement for robust pattern recognition
Hideyuki Watanabe ... Miho Ohsaki
-
Hideyuki Watanabe, et. al.Hideyuki Watanabe ... Miho Ohsaki
01 Sep 2011
01 Sep 2011

Comparison between Minimum Classification Error training and Relevance Vector Machine
Hisashi Uehara ... Miho Ohsaki
-
Hisashi Uehara, et. al.Hisashi Uehara ... Miho Ohsaki
01 Nov 2012
01 Nov 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Large-margin minimum classification error training: A theoretical risk minimization perspective

Abstract

Talk to us

Similar Papers

More From: Computer Speech & Language