Abstract

We attack the difficult problem of optimizing a hidden Markov model (HMM) based speech recognizer to minimize its misclassification rate. In conventional HMM recognizer design, the training data is divided into subsets of identically labeled tokens and the HMM for each label is designed from the corresponding subset using a maximum likelihood (ML) objective. However, ML is a mismatched objective and ML design does not minimize the recognizer's misclassification rate. The misclassification rate is difficult to optimize directly because the cost surface is riddled with shallow local minima that tend to trap naive descent methods. We propose an approach which is based on the powerful technique of deterministic annealing (DA) to minimize the misclassification cost while avoiding shallow local minima. In the DA approach, the classifier's decision is randomized during design and its expected misclassification rate is minimized while enforcing a level of randomness measured by the Shannon entropy. The entropy constraint is gradually withdrawn (annealing) and in the limit, the cost function converges to the misclassification rate of a regular non-random recognizer. This algorithm is implementable by a low complexity forward-backward procedure similar to the Baum-Welch re-estimation used in ML design. Our experiments on speaker-independent isolated word speech recognition of clean and noise-corrupted utterances of letters of the difficult E-set=(b,c,d,e,g,p,t,v,z) demonstrate that DA-designed recognizers offer consistent and substantial improvements in accuracy over ML-designed recognizers.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call