Abstract
We present a novel method to incorporate temporal correlations into a speech recognition system based on conventional hidden Markov models (HMMs). The temporal correlations are considered to be useful for recognition because of the fact that the speech features of the present frame are highly informative about the feature characteristics of neighboring frames. In this paper, by treating these correlations in the form of conditional probability distributions (PDs), we propose a new technique for incorporating frame correlations. With the proposed method called the extended logarithmic pool (ELP), we approximate a joint conditional PD by separate conditional PDs associated with respective conditions. We provide a constrained optimization algorithm with which we can find the optimal value for the pooling weights. For practical purposes, we also suggest methods to get robust PD estimates for characterizing frame correlation. In addition, to improve model discriminability, a technique to combine two kinds of PDs through the exponents is introduced. The results in the experiments of speaker-independent continuous speech recognition with the proposed approaches show error reduction up to 20.5% as compared to that with the conventional bigram-constrained (BC) HMM method.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have