Abstract

A new phone recognizer has been implemented which extends the (phonotactic) decoding constraint to sequences of three phones. It is based on a structure similar to a second order ergodic hidden Markov model (HMM). This kind of model assumes direct correspondence between the model states and phones, thus constraints on possible state sequences are equivalent to phonotactic constraints. Very high coverage by both left and right context dependent phone models has been achieved using two methods. The first assumes that some contexts have the same or very similar effect on the phone in question. Thus they are merged into the same contextual class. The outcome is a set of 19 left context classes and 18 right context classes. The second assumes that left context mostly influences the beginning of a phone, whereas the right context influences the end of the phone. Each phone (a state in an ergodic HMM) is represented by a sequence of three probability density functions (pdf s), which is equivalent to a three state left-to-right HMM. We generate acoustic models such that first pdf in the model is conditioned on the left context, the middle pdf is context independent, and the last pdf is conditioned on the right context. A large number of such quasi-triphonic acoustic models can be generated providing a good triphone coverage for a given task efficiently utilizing the available training data set. The current implementation of the recognizer described here has been applied to the DARPA Resource Management Task. Since true phone sequences are not available, they are estimated from text from a phone realization regression tree trained on TIMIT database transcriptions. The estimates of the true phone sequences are used in training the models and generating reference phone sequences for scoring. The best phone recognition match between the most likely output of the regression tree and the phone recognizer for the DARPA February 89 test set was 75.5% correct with 79.5% accuracy.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call