Abstract

Lipreading is the ability to recognize words or sentences from the mouth movements of a speaking person. This process is also known as Visual Speech Recognition (VSR). Lipreading has two main advantages: facilitate communication for people with hearing or speaking problems and aid speech recognition in noisy environments. In this paper, we propose a lipreading computing system capable of recognizing ten common Arabic words by performing word extraction from the mouth movements. The system receives a video of a person uttering an Arabic word as an input and outputs the text of the predicted word. During the implementation stage of the proposed system, three deep learning and neural network architectures are alternatively used to train, validate, and test the system using a locally collected and preprocessed dataset. The dataset contains 1051 videos and will be made available upon request. Moreover, a voting model that combines the three architectures is proposed. The highest testing accuracy (i.e. 82.84%) is achieved by leveraging the voting model.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.