Abstract

This paper proposes a lip reading method based on convolutional neural networks applied to Concatenated Three Sequence Keyframe Image (C3-SKI), consisting of (a) the Start-Lip Image (SLI), (b) the Middle-Lip Image (MLI), and (c) the End-Lip Image (ELI) which is the end of the pronunciation of that syllable. The lip area’s image dimensions were reduced to 32×32 pixels per image frame and three keyframes concatenate together were used to represent one syllable with a dimension of 96×32 pixels for visual speech recognition. Every three concatenated keyframes representing any syllable are selected based on the relative maximum and relative minimum related to the open lip’s width and height. The evaluation results of the model’s effectiveness, showed accuracy, validation accuracy, loss, and validation loss values at 95.06%, 86.03%, 4.61%, and 9.04% respectively, for the THDigits dataset. The C3-SKI technique was also applied to the AVDigits dataset, showing 85.62% accuracy. In conclusion, the C3-SKI technique could be applied to perform lip reading recognition.

Highlights

  • Deep learning applications, especially Convolutional Neural Network (CNN) applications, have recently achieved impressive success in diverse object detection and recognition tasks [1], CNNs face some challenges, in particular in video recognition

  • This paper proposes the application of the C3-SKI to a CNN for lip reading

  • The C3-SKI consisting of StartLip Image (SLI), Middle-Lip Image (MLI), and End-Lip Image (ELI) was tested in lip reading recognition on THDigits and AVDigits datasets

Read more

Summary

Introduction

Especially Convolutional Neural Network (CNN) applications, have recently achieved impressive success in diverse object detection and recognition tasks [1], CNNs face some challenges, in particular in video recognition. If the audio at the crucial moment is missing, it may result in the video’s contents being misunderstood [2]. These videos will be more useful if they were edited and the missing words or messages could be found. Most of the proposed solutions rely on the lip reading technique to help transcription by reading and observing the moving lips, including tongue and face to get the right words. The process of transcribing or translating the speech obtained by lip reading is a skill that requires learning and practice until becoming proficient at recognizing the lip movement or lip pattern related to the pronunciation of each syllable

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call