Abstract

Different parameterizations of the speech signal may potentially extract complementary information useful to increase the accuracy in discriminating between confusable sound classes. In spite of this a single parameterization has nearly universally been used in speech recognition because the most diffused matching technology (hidden Markov models) is bound by theoretical and practical constraints that limit the use of multiple features derived from the speech signal with different processing algorithms. On the contrary neural networks are capable of incorporating multiple heterogeneous input features, which do not need to be treated as independent, finding the optimal combination of these features for classification. The purpose of this work is the exploitation of this potentiality of neural networks to improve the speech recognition accuracy. The multiple input features coming from different parameterization algorithms are combined through a network architecture called multi-source NN, designed to obtain the best synergy from them. In this work, we report the last results obtained on this research line by combining the basic spectral features with two auditory inspired features, a formant like feature and the frequency derivatives. The results show that multi-source NN leads to significant error reductions on both isolated words and continuous speech test sets.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.