Abstract

Many speech recognition systems use multiple information streams to compute HMM output probabilities (e.g. systems based on semicontinuous or discrete HMMs use one codebook for cepstral coefficients, and another one for delta cepstral coefficients). The final score is a weighted sum of the contributions of every stream. These weights can be found empirically and usually the same set of weights is used for every acoustic model. There is reason to believe that there are features which are more important for some acoustic models than for others. Especially one would expect the beginning and ending segment of a phoneme to be more context dependent than the middle part, so in that case the probability estimator of the speech recognizer should put more emphasis on the delta-spectrum than on the spectrum. Experiments have shown that spectral or cepstral coefficients are more important than their derivatives and more important than power or delta-power coefficients. We propose an algorithm for learning individual stream weights for every HMM state. Since these individual weights are a superset of the stream-only dependent weights, they can reproduce the results of the stream-only dependent weights and, additionally, discriminate between HMM states. Thus, the recognition performance must improve. >

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call