Abstract

Automatic speech recognition (ASR) systems have been able to gain much popularity since many multimedia applications require robust speech recognition algorithms. The use of audio and visual information in the speaker-independent continuous speech recognition process makes the performance of the system better compared to the ones with only the audio information. There has been a marked increase in the recognition rates by the use of visual data to aid the audio data available. This is due to the fact that video information is less susceptible to ambient noise than audio information. In this paper a robust audio-video speech recognition (AVSR) system that allows us to incorporate the coupled hidden Markov model (CHMM) model for fusion of audio and video modalities is presented. The application records the input data and recognizes the isolated words in the input file over a wide range of signal to noise ratio (SNR). The experimental results show a remarkable increase of about 10% in the recognition rate in the AVSR compared to the audio only ASR and 20% compared to the video only ASR for an SNR of 5 dB.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call