Speech Recognition with Multi-modal Features Based on Neural Networks

Myung Won Kim,Joung Woo Ryu,Eun Ju Kim

doi:10.1007/11893257_55

Abstract

Recent researches have been focusing on fusion of audio and visual features for reliable speech recognition in noisy environments. In this paper, we propose a neural network based model of robust speech recognition by integrating audio, visual, and contextual information. Bimodal Neural Network (BMNN) is a multi-layer perceptron of 4 layers, which combines audio and visual features of speech to compensate loss of audio information caused by noise. In order to improve the accuracy of speech recognition in noisy environments, we also propose a post-processing based on contextual information which are sequential patterns of words spoken by a user. Our experimental results show that our model outperforms any single mode models. Particularly, when we use the contextual information, we can obtain over 90% recognition accuracy even in noisy environments, which is a significant improvement compared with the state of art in speech recognition.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Speech Recognition with Multi-modal Features Based on Neural Networks

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Speech Recognition by Integrating Audio, Visual and Contextual Features Based on Neural Networks
Myung Won Kim ... Joung Woo Ryu
-
Myung Won Kim, et. al.Myung Won Kim ... Joung Woo Ryu
01 Jan 2004
01 Jan 2004

Performance Improvement in Speech Recognition Using Multimodal Features
Myung Won Kim ... Won Moon Song
-
Myung Won Kim, et. al.Myung Won Kim ... Won Moon Song
01 Jan 2007
01 Jan 2007

Auditory processing of speech signals for robust speech recognition in real-world noisy environments
Doh-Suk Kim ... Soo-Young Lee
IEEE Transactions on Speech and Audio Processing | VOL. 7
Doh-Suk Kim, et. al. Doh-Suk Kim ... Soo-Young Lee
01 Jan 1998
IEEE Transactions on Speech and Audio Processing | VOL. 7

Research on the application of machine learning models in speech recognition in noisy environments
Yujie Tian
Applied and Computational Engineering | VOL. 19
Yujie TianYujie Tian
23 Oct 2023
Applied and Computational Engineering | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Speech Recognition with Multi-modal Features Based on Neural Networks

Abstract

Talk to us

Similar Papers