Variable Structure and Modeling Units for Chinese Lipreading

Baosheng Sun,Dongliang Xie,Tiantian Duan

doi:10.1142/s0218001422560213

Abstract

Lipreading is a type of Human–Computer Interaction (HCI) based on visual information. From a linguistic point of view, Chinese is a monosyllabic language with a much higher proportion of homophones than English. Identifying homophones in Chinese Mandarin lipreading is very challenging. Since the lip shape in the context can distinguish homophones, and smaller recognition units can reduce the types of recognition and alleviate data sparsity, we propose to improve the accuracy of lipreading by simultaneously exploiting the correlation of lip features at different distances and smaller modeling units. We implement a long short-term multi-feature space to represent lip features, and CTC–Attention to learn temporal correlations. We also introduce Weight Finite State Transducer (WFST) to enhance the semantic analysis capability of the model. Our model aims to distinguish homophones and improve the accuracy of lipreading. To reduce data sparsity, we use Tonal Initials and Finals (TIF) as the modeling units. We record a sentence-level Chinese lipreading dataset, ICSLR, and label Mandarin characters, syllables, and TIF. We demonstrate the effectiveness of the proposed approach compared to its counterparts through extensive experiments on Grid, ICSLR, and CMLR datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Variable Structure and Modeling Units for Chinese Lipreading

Abstract

Talk to us

Similar Papers

More From: International Journal of Pattern Recognition and Artificial Intelligence

Lead the way for us

Similar Papers

Lip feature extraction based on audio-visual correlation
...
-
, et. al. ...
08 Sep 2005
08 Sep 2005

A Method for Dealing with Data Sparsity and Cold-Start Limitations in Service Recommendation Using Personalized Preferences
Kenneth K Fletcher
-
Kenneth K FletcherKenneth K Fletcher
01 Jun 2017
01 Jun 2017

Lip Feature Extraction and Feature Evaluation in the Context of Speech and Speaker Recognition
Petar S Aleksic ... Aggelos K Katsaggelos
-
Petar S Aleksic, et. al.Petar S Aleksic ... Aggelos K Katsaggelos
01 Jan 2009
01 Jan 2009

Six key points lip's feature extraction using adaptive threshold segmentation
Hadid Tunas Bangsawan ... Ronny Mardiyanto
-
Hadid Tunas Bangsawan, et. al.Hadid Tunas Bangsawan ... Ronny Mardiyanto
01 May 2015
01 May 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Variable Structure and Modeling Units for Chinese Lipreading

Abstract

Talk to us

Similar Papers

More From: International Journal of Pattern Recognition and Artificial Intelligence