Multi-source neural networks for speech recognition: a review of recent results

R Gemello,L Moisa,F Mana,D Albesano

doi:10.1109/ijcnn.2000.861468

Abstract

Different parameterizations of the speech signal may potentially extract complementary information useful to increase the accuracy in discriminating between confusable sound classes. In spite of this a single parameterization has nearly universally been used in speech recognition because the most diffused matching technology (hidden Markov models) is bound by theoretical and practical constraints that limit the use of multiple features derived from the speech signal with different processing algorithms. On the contrary neural networks are capable of incorporating multiple heterogeneous input features, which do not need to be treated as independent, finding the optimal combination of these features for classification. The purpose of this work is the exploitation of this potentiality of neural networks to improve the speech recognition accuracy. The multiple input features coming from different parameterization algorithms are combined through a network architecture called multi-source NN, designed to obtain the best synergy from them. In this work, we report the last results obtained on this research line by combining the basic spectral features with two auditory inspired features, a formant like feature and the frequency derivatives. The results show that multi-source NN leads to significant error reductions on both isolated words and continuous speech test sets.

Full Text