Abstract

Speech Communication in a noisy environment is a difficult and challenging task. Many professionals work in noisy environments like aviation, constructions, or manufacturing, and find it difficult to communicate orally. Such noisy environments need an automated lip-reading system that could be helpful in communicating some instructions and commands. This paper proposes a novel lip-reading solution, which extracts the geometrical shape of lip movement from the video and predicts the words/sentences spoken. An Indian specific language data set is developed which consists of lip movement information captured from 50 persons. This includes students in the age group of 18 to 20 years and faculty in the age group of 25 to 40 years . All have spoken a paragraph of 58 words within 10 sentences in Hindi (Devanagari, spoken in India) language which was recorded under various conditions. The implementation consists of facial parts detection, along with Long short term memory’s. The proposed solution is able to predict the words spoken with 77% and 35% accuracy for data set of 3 and 10 words respectively. The sentences are predicted with 20% accuracy, which is encouraging.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call