The paper introduced a speech feature grouping algorithm for the speech recognition system based traditional Markov in accordance with the large computation of the traditional hidden Markov model and the Viterbi algorithm as well as the Gaussian mixture distribution probability. For the speech characteristic parameters, clustering was executed by K-Means algorithm on the basis of the first and second segmentation, and then obtained the grouped characteristic parameters and the parameters to be grouped, and the speech samples can be divided into different characteristic group according to these two parameters. On this basis, a grouping training algorithm was proposed by using the redundant, which improved the accuracy of grouping the speech characteristic by clustering algorithm. Compared with the traditional HMM method, the amount of calculation can be reduced more than 60% in the case of ensuring the speech recognition rate. In the traditional HMM speech recognition system, and for the isolated word, the classic Viterbi algorithm perfectly solved the decoding problem with iterative approach in the mathematics. But in the speech recognition systems, for the large vocabulary in the practical application of acoustic decoding, the amount of computation that the algorithm required is considerable. Assuming the vocabulary that a speech recognition system can recognize is 1000, then to establish a model for each word, and assuming they have the same number of states N, in order to facilitate the estimation and calculation, these models must be connected to a large model. So, the state number of the large model is 1000 times larger than that of original model. Because the orders of computation that the Viterbi algorithm required is N2T (T is the number of input speech frame), the computation amount of Viterbi algorithm of large models increased four orders than that of the original phoneme model. Moreover, according to the experiment, it was found that the computation was mainly used for calculating the multi- dimensional (mixture) Gaussian distribution probability of the obtained observation vector for each state of every frame. Therefore, the real-time requirement for the small mobile device with large vocabulary HMM speech recognition system is difficult to meet because of the limit computing capability. Contrary to the large amount of computation for the Viterbi algorithm and the Gaussian mixture distribution, the paper presented a grouping model of speech characteristics, before the matching calculation of the input speech done, the grouping judgment was executed firstly and obtained the group that the input speech belonged to, then to calculate the HMM parameters of input speech matching with that of all the speech in the same group. Then combined with the improved HMM speech recognition system, the efficiency can be improved.
Read full abstract