Deep Learning Based Lipreading for Video Captioning

Sankalp Kala,Prof Sridhar Ranganathan

doi:10.47191/etj/v9i05.08

Abstract

Visual speech recognition, often referred to as lipreading, has garnered significant attention in recent years due to its potential applications in various fields such as human-computer interaction, accessibility technology, and biometric security systems. This paper explores the challenges and advancements in the field of lipreading, which involves deciphering speech from visual cues, primarily movements of the lips, tongue, and teeth. Despite being an essential aspect of human communication, lipreading presents inherent difficulties, especially in noisy environments or when contextual information is limited. The McGurk effect, where conflicting audio and visual cues lead to perceptual illusions, highlights the complexity of lipreading. Human lipreading performance varies widely, with hearing-impaired individuals achieving relatively low accuracy rates. Automating lipreading using machine learning techniques has emerged as a promising solution, with potential applications ranging from silent dictation in public spaces to biometric authentication systems. Visual speech recognition methods can be broadly categorized into those that focus on mimicking words and those that model visemes, visually distinguishable phonemes. While word-based approaches are suitable for isolated word recognition, viseme-based techniques are better suited for continuous speech recognition tasks. This study proposes a novel deep learning architecture for lipreading, leveraging Conv3D layers for spatiotemporal feature extraction and bidirectional LSTM layers for sequence modelling. The proposed model demonstrates significant improvements in lipreading accuracy, outperforming traditional methods on benchmark datasets. The practical implications of automated lipreading extend beyond accessibility technology to include biometric identity verification, security surveillance, and enhanced communication aids for individuals with hearing impairments. This paper provides insights into the advancements, challenges, and future directions of visual speech recognition research, paving the way for innovative applications in diverse domains.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Deep Learning Based Lipreading for Video Captioning

Abstract

Talk to us

Similar Papers

More From: Engineering and Technology Journal

Lead the way for us

Journal: Engineering and Technology Journal	Publication Date: May 15, 2024
License type: cc-by

Similar Papers

Automatic visual speech segmentation and recognition using directional motion history images and Zernike moments
Ayaz A Shaikh ... Jayavardhana Gubbi
The Visual Computer | VOL. 29
Ayaz A Shaikh, et. al.Ayaz A Shaikh ... Jayavardhana Gubbi
13 Sep 2012
The Visual Computer | VOL. 29

CNN Based Feature Extraction for Visual Speech Recognition in Malayalam
Shabina Bhaskar ... T M Thasleema
-
Shabina Bhaskar, et. al.Shabina Bhaskar ... T M Thasleema
22 Nov 2021
22 Nov 2021

Hyper column model vs. fast DCT for feature extraction in visual arabic speech recognition
A Sagheer ... N Tsuruta
-
A Sagheer, et. al.A Sagheer ... N Tsuruta
01 Dec 2005
01 Dec 2005

Appearance and shape-based hybrid visual feature extraction: toward audio–visual automatic speech recognition
Saswati Debnath ... Pinki Roy
Signal, Image and Video Processing | VOL. 15
Saswati Debnath, et. al.Saswati Debnath ... Pinki Roy
11 Jun 2020
Signal, Image and Video Processing | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deep Learning Based Lipreading for Video Captioning

Abstract

Talk to us

Similar Papers

More From: Engineering and Technology Journal