Abstract

In continuous density hidden Markov models (HMMs) for speech recognition, the probability density function (PDF) for each state is usually expressed as a mixture of Gaussians. We present a model in which the PDF is expressed as the convolution of two densities. We focus on the special case where one of the convolved densities is a M-Gaussian mixture, and the other is a mixture of N impulses. We present the reestimation formulae for the parameters of the M/spl times/N convolutional model, and suggest two ways for initializing them, the residual K-Means approach, and the deconvolution from a standard HMM with MN Gaussians per state using a genetic algorithm to search for the optimal assignment of Gaussians. Both methods result in a compact representation that requires only /spl Oscr/(M+N) storage space for the model parameters, and O(MN) time for training and decoding. We explain how the decoding time can be reduced to O(M+kN), where k<M. Finally, results are shown on the 1996 Hub-4 Development test, demonstrating that a 32/spl times/2 convolutional model can achieve performance comparable to that of a standard 64-Gaussian per state model.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.