Applying dynamic context into MLP/HMM speech recognition system

Petri Salmela

doi:10.1006/csla.2001.0167

Abstract

This paper discusses the generation of context description for an MLP/HMM speech recognition system such that the time span of the context can be dynamic during training and recognition. The context description is obtained using a window sliding over the feature vector sequence and a codebook classifying the feature vectors in the window. The classification results are stored in a matrix initially containing only zeros and having as many elements as there are codevectors in the codebook. Since each matrix element corresponds only to one codevector, the classification results of the feature vectors inside the window can be indicated by setting the corresponding element values to one. The dynamic nature of the context is obtained when the number of the feature vectors in the window is larger than the number of the elements in the matrix that are allowed to be one at the same time. When the dynamic context was included in the MLP/HMM recognition system, the string recognition accuracy of the test set increased from 92.9 to 93.8 % on average. This test set contained 29 188 Finnish digit strings. Half of this test set had a signal-to-noise ratio (SNR) of 20 dB, while the rest of this set had a SNR of −1.7 dB on average.

Full Text