Abstract

In this paper, we report our development of context-dependent allophonic hidden Markov models (HMMs) implemented in a 75 000-word speaker-dependent Gaussian-HMM recognizer. The context explored is the immediate left and/or right adjacent phoneme. To achieve reliable estimation of the model parameters, phonemes are grouped into classes based on their expected co-articulatory effects on neighboring phonemes. Only five separate preceding and following contexts are identified explicitly for each phoneme. By grouping the contexts we ensure that they occur frequently enough in the training data to allow reliable estimation of the parameters of the HMM representing the context-dependent units. Further improvement in the estimation reliability is obtained by tying the covariance matrices in the HMM output distributions across all contexts. Speech recognition experiments show that when a large amount of data (e.g. over 2500 words) is used to train context-dependent HMMs, the word recognition error rate is reduced by 33%, compared with the context-independent HMMs. For smaller amounts of training data the error reduction becomes less significant.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.