Abstract

ABSTRACTSpeech recognition performance of the machine has been greatly improved using artificial intelligence. However, compared with the superior recognition ability of human auditory system, the machine still has some problems to deal with. Based on the existing physiological principle of human auditory system, this paper proposes a novel emergent auditory model. This model simulates each key part of the human auditory pathway with a deep developmental network (DDN). Furthermore, this model simulates the function of the superior colliculus in the thalamus, i.e., context integration, as an additional layer in the DDN. Mel-frequency cepstral coefficients (MFCC) are used to extract the speech signal features to be inputs of the DDN. This work is different from other previous models as we emphasise the mechanism that makes a system to develop its emergent representations from its operational experience, i.e., the internal unsupervised neurons of the DDN are utilised to depict the short contexts, and competitions among them afford an interpretation of how such internal neurons denote the different speech contexts when they are not supervised by the external world. Experimental results show the advantage of the proposed DNN compared to the state-of-the-art methods for the recognition accuracies of English words and phrases.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.