Visual Speech Processing and Recognition

U B Mahadevaswamy,M Shashank Rao,C Anagha,S Vrushab,V Sangameshwar

doi:10.1007/978-981-15-3383-9_44

Abstract

Lip reading is the ability to understand what a person is communicating using just the video information. Due to the advent of Internet and computers, it is now possible to remove human intervention from lip reading. Such automation is only feasible because of a couple of developments in the field of computer vision: availability of a large-scale dataset for training and use of neural network models. The applications to this are numerous. From dictating messages to a device in a noisy environment to improving speech recognition in the current technologies, visual speech recognition has proved to be pivotal. In this paper, the lip-reading models are based on deep neural network architectures that capture temporal data which are created for the task of speech recognition.

Full Text