Abstract

Traditionally, in speech recognition, the hidden Markov model state emission probability distributions are usually associated to continuous random variables, by using Gaussian mixtures. Thus, complex multimodal inter-feature dependencies are not accurately modeled by Gaussian models, since they are unimodal distributions and mixtures of Gaussians are needed in these complex cases, but this is done in a loose and inefficient way. Graphical models provide a precise and simple mechanism to model the dependencies among two or more variables. This paper proposes the use of discrete random variables as observations and graphical models to extract the internal dependence structure in the feature vectors. Therefore, speech features are quantized to a small number of levels, in order to obtain a tractable model. These quantized speech features provide a mechanism to increase the robustness against noise uncertainty. In addition, discrete random variables allow the learning of joint statistics of the observation densities. A method to estimate a graphical model with a constrained number of dependencies is shown in this paper, being a special kind of Bayesian network. Experimental results show that by using this modeling, better performance can be obtained compared to standard baseline systems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call