Abstract

Most current state-of-the-art speech recognition systems use the hidden Markov model (HMM) for modeling the acoustical characteristics of a speech signal. In the first-order HMM, speech data are assumed to be independently and identically distributed (iid), meaning that there is no dependency between neighboring feature vectors. Another assumption is that the current vector depends only on the current HMM state. In practice, however, these assumptions are not true. We describe a hybrid HMM/BN (Bayesian network) acoustic model, where the dependency of the current speech vector on the previous vector and on the previous state is also learned and used in speech recognition. This is possible because the state probability distribution is modeled by a BN. Previous instances of the state and speech feature vector are represented by additional variables of the BN and the probabilistic dependencies between them, and their current instances are learned during training. During recognition, the likelihood of the current feature vector is inferred from the BN where the previous state and previous feature vector are treated as hidden. We have evaluated this hybrid HMM/BN model with our LVCSR system by phoneme recognition and by large-vocabulary continuous word recognition tasks. In both cases, we observed improved performance over the conventional Gaussian mixture HMM.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call