Abstract

Describes a framework for optimising the structure and parameters of a continuous-density HMM-based large-vocabulary speech recognition system using the maximum mutual information estimation (MMIE) criterion. To reduce the computational complexity of the MMIE training algorithm, confusable segments of speech are identified and stored as word lattices of alternative utterance hypotheses. An iterative mixture splitting procedure is also employed to adjust the number of mixture components in each state during training such that the optimal balance between number of parameters and available training data is achieved. Experiments are presented on various test sets from the Wall Street Journal database using the full SI-284 training set. These show that the use of lattices makes MMIE training practicable for very complex recognition systems and large training sets. Furthermore, experimental results demonstrate that MMIE optimisation of system structure and parameters can yield useful increases in recognition accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call