Abstract
A new modular recurrent neural network (MRNN)-based method for continuous Mandarin speech recognition is proposed. The system uses five RNNs to accomplish many subtasks separately and then combine them to integrally solve the problem. They include two RNNs for the discrimination of the two sub-syllable groups of 100 right-final-dependent (RFD) initials and 39 context independent (CI) finals, two RNNs for the generation of dynamic weighting functions for sub-syllable's integration, and one RNN for syllable boundary detection. All RNN modules are combined using a delay-decision Viterbi search. The method differs from the ANN/HMM hybrid approach of using ANNs to perform not only sub-syllables discrimination but also temporal structure modeling of the speech signal. The system is trained using a three-stage training method embedding with the MCE/GPD algorithms. Besides, a fast recognition method using multi-level pruning is also proposed. Experimental results showed that it outperforms the HMM method on both the recognition accuracy and the computational complexity.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have