Abstract

Assistive technology would be an immense benefit for hearing impaired people by using Audio Visual Speech Recognition (AVSR). Around 466 million people worldwide suffer from hearing loss. Hearing impaired student rely on lip reading for understanding the speech. Lack of trained sign language facilitators and high cost of assistive devices are some of the major challenges faced by hearing impaired students. In this work, we have identified a visual speech recognition technique using cutting edge deep learning models. Moreover, the existing VSR techniques are erroneous. Hence to address the gaps identified, we propose a novel technique by fusion the results from audio and visual speech. This study proposes a new deep learning based audio visual speech recognition model for efficient lip reading. In this paper, an effort has been made to improve the performance of the system significantly by achieving a lowered word error rate of about 6.59% for ASR system and accuracy of about 95% using lip reading model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call