Abstract
The subject of this paper is the integration of the traditional vector quantizer (VQ) and discrete hidden Markov models (HMM) combination in the mixture emission density framework commonly used in automatic speech recognition (ASR). It is shown that the probability density of a system that consists of a VQ and a discrete classifier can be interpreted as a special case of a semi-continuous mixture model. Thus, the VQ parameters and the classifier can be trained jointly. In this framework, a gradient based VQ training method for single and multiple feature stream systems is derived. This leads to an approach that is directly related to the paradigm of maximum mutual information (MMI) neural networks, that has been successfully applied as VQ in ASR earlier. In continuous speech recognition experiments that were carried out for the Resource Management and Wall Street Journal databases the presented systems achieve recognition accuracies that compete well with comparable Gaussian mixture HMMs. Thus, we demonstrate that the performance degradations, often reported for discrete HMM systems, are not mainly caused by the vector quantization process in itself, but that they are due to the traditional separation of the VQ and the HMM during parameter estimation. These degradations can be avoided by training of the entire system as described here, while keeping the attractive computational speed of discrete HMMs.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have