Abstract

In the modern era, recent advancement in computer vision has led to emergent attention in lip reading. Indeed, lip-reading is used to understand speech without hearing it, and the process is mentioned as a lip-reading system. To construct an automatic lip-reading system, locating the lip and defining the lip region is essential, especially under different lighting conditions, significantly impacting the robustness of the lip-reading system. Unluckily, in previous studies, lip localization under illumination and shadow consideration has not been well solved. In this paper, we extant a local region-based approach towards the lip-reading system. It consists of four significant parts, firstly detecting/localizing the human face, mouth and lip region of interest in the first video frame. Secondly, apply pre-processing to overwhelmed the inference triggered by illumination effects, shadow and teeth appearance, thirdly create contour line using sixteen key points with geometric constraint and stored the coordinates of these constraints. Finally, track the coordinates of sixteen points in the following frames. The proposed method adapts to the lip movement and is robust in contrast to the appearance of teeth, shadows, and low contrast environment. Extensive experiments show encouraging results and the proposed method's effectiveness compared to the existing methods.

Highlights

  • The continuous progress of technology brings to an irreversible change of paradigms of interaction between humans and machines

  • This paper presented an approach to lip detection and tracking with two main phases: (i) lip contour extraction for the first frame and followed by (ii) lip tracking in the following lip frames

  • The lip tracking procedure is applied to sequences of frames, starting with the first frame to complete the entire sequence

Read more

Summary

Introduction

The continuous progress of technology brings to an irreversible change of paradigms of interaction between humans and machines. Traditional ways of human-computer interaction using keyboards, mice, and display monitors are being replaced by more natural modes, e.g. speech, touch, and gesture. New PCs, tablets and smartphones are moving increasingly toward a direction that will bring in a short time to have interaction paradigms so advanced that they will be completely transparent to users. Lip movement and reading are used to recognize speech from a speaker without hearing. In 1976, audio-visual illusion became recognize as the McGurk effect [1], which shows that visual cues information combined into the listener's mind automatically and unintentionally. The listener perceived the syllable, which is dependent on the visual information and strength of audio from the speaker

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.