Abstract

The objective of this paper is to present experiments and discussions of how some neural network algorithms can help to improve phoneme recognition using mixture density hidden Markov models (MDHMMs). In MDHMMs, the modelling of the stochastic observation processes associated with the states is based on the estimation of the probability density function of the short-time observations in each state as a mixture of Gaussian densities. The Learning Vector Quantization (LVQ) is used to increase the discrimination between different phoneme models both during the initialization of the Gaussian codebooks and during the actual MDHMM training. The Self-Organizing Map (SOM) is applied to provide a suitably smoothed mapping of the training vectors to accelerate the convergence of the actual training. The codebook topology which is obtained can also be exploited in the recognition phase to speed up the calculations to approximate the observation probabilities. The experiments with LVQ and SOMs show reductions both in the average phoneme recognition error rate and in the computational load compared to the maximum likelihood training and the Generalized Probabilistic Descent (GPD). The lowest final error rate, however, is obtained by using several training algorithms successively. Additional reductions from the online system of about 40% in the error rate are obtained by using the same training methods, but with advanced and higher dimensional feature vectors.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.