A Survey of Research on Lipreading Technology

Mingfeng Hao,Alimjan Aysa,Nurbiya Yadikar,Mutallip Mamut,Kurban Ubul

doi:10.1109/access.2020.3036865

Abstract

Although automatic speech recognition (ASR) technology is mature, there are still some unsolved problems, such as how to accurately identify what the speaker is saying in a noisy environment. Lipreading is a visual speech recognition technology that recognizes the speech content based on the motion characteristics of the speaker's lips without speech signals. Therefore, lipreading can detect the speaker's content in a noisy environment, even without a voice signal. This article summarizes the main research from traditional methods to deep learning methods on lipreading. Traditional lipreading methods are mainly discussed from three aspects: lip detection and extraction, lip feature extraction, and classification. Traditional feature extraction methods focus on handmade features, which are, however, not very reliable under unconstrained conditions. In recent years, traditional lipreading methods have been gradually replaced by deep learning methods. The advantage of deep learning methods is that they can learn the best features from large databases. This article analyzes typical deep learning methods in detail according to their structural characteristics, and lists existing lipreading databases, including their detailed information and the methods applied to these databases. Finally, the problems and challenges of current lipreading methods are discussed, and the future research direction has prospected.

Highlights

People often communicate through hearing and vision, that is, through voice signals and visual signals
Speech signals often contain more information than visual signals, so many studies have focused on Automatic Speech Recognition (ASR)
Visual speech technology is known as Automatic Lipreading (ALR), which infers the speech content according to the movement of lips in the process of speaking

Summary

A Survey of Research on Lipreading Technology

Mingfeng Hao1, Mutallip Mamut2, Nurbiya Yadikar1, Alimjan Aysa3,4, and Kurban Ubul1,4 *, (Member, IEEE) This work was supported by the National Natural Science Foundation of China under Grant (No 61862061, 61563052, 62061045, 61363064), Scientific Research Initiate Program of Doctors of Xinjiang University under Grant No.BS180268, The Funds for Creative Groups of Higher Educational Research Plan in Xinjiang Uyghur Autonomous, China under Grant (No XJEDU2017T002)

INTRODUCTION

TRADITIONAL FEATURE EXTRACTION AND RECOGNITION METHODS

LIP DETECTION AND EXTRACTION

Method

DEEP NEURAL NETWORK BASED METHODS

DATABASES AND PERFORMANCE COMPARISON

DIFFICULTIES AND CHALLENGES OF LIPREADING

CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 165	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Survey of Research on Lipreading Technology

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Lip-Reading Research Based on ShuffleNet and Attention-GRU
Yixian Fu ... Yuanyao Lu
-
Yixian Fu, et. al.Yixian Fu ... Yuanyao Lu
01 Jan 2023
01 Jan 2023

Joint Representation and Recognition for Ship-Radiated Noise Based on Multimodal Deep Learning
Fei Yuan ... En Cheng
Journal of Marine Science and Engineering | VOL. 7
Fei Yuan, et. al.Fei Yuan ... En Cheng
27 Oct 2019
Journal of Marine Science and Engineering | VOL. 7

Neighbouring Proximity - An Key Impact Factor of Deep Machine Learning
Hongyuan Shi ... Yunke Li
-
Hongyuan Shi, et. al.Hongyuan Shi ... Yunke Li
01 Jul 2018
01 Jul 2018

Lips detection for audio-visual speech recognition system
Siew Wen Chin ... Kah Phooi Seng
-
Siew Wen Chin, et. al. Siew Wen Chin ... Kah Phooi Seng
01 Feb 2009
01 Feb 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Survey of Research on Lipreading Technology

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access