Dual-flow Spatio-temporal Separation Network for Lip Reading

An Huang,Xueyi Zhang

doi:10.1088/1742-6596/2400/1/012028

Abstract

Lip reading is a task of predicting the corresponding language information in a silent video, which has attracted a lot of attention in recent years. Its key is to capture temporal and spatial features from lip motion videos and decode them. In the past, lip reading methods based on deep learning mostly adopt the form of spatio-temporal series connection, which first extracts spatial features, and then carries out global time-domain modeling on this basis. The spatial information extracted by the current approach is insufficient. To get more abundant spatio-temporal video representation and fully integrate the features from different viewpoints, this paper proposes a novel lip motion feature extraction framework, Dual-flow Spatio-temporal Separation Network (DSSN). Specifically, we adopt an end-to-end double tower structure to model the temporal information and spatial information respectively, and carry out feature fusion through collaborative learning. Finally, we evaluate our proposed model on the OuluVS2 lip reading dataset. Experiments show that our method outperforms baseline models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Physics: Conference Series	Publication Date: Dec 1, 2022
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

Dual-flow Spatio-temporal Separation Network for Lip Reading

Abstract

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series

Lead the way for us

Similar Papers

Speaker-Adaptive Lip Reading with User-Dependent Padding
Minsu Kim ... Hyunjun Kim
-
Minsu Kim, et. al.Minsu Kim ... Hyunjun Kim
01 Jan 2021
01 Jan 2021

Harnessing AI for Speech Reconstruction using Multi-view Silent Video Feed
Yaman Kumar ... Rajiv Ratn Shah
-
Yaman Kumar, et. al.Yaman Kumar ... Rajiv Ratn Shah
15 Oct 2018
15 Oct 2018

Lip Reading: Delving into Deep Learning
Rishabh Nevatia
International Journal for Research in Applied Science and Engineering Technology | VOL. 9
Rishabh NevatiaRishabh Nevatia
30 Sep 2021
International Journal for Research in Applied Science and Engineering Technology | VOL. 9

Spatial-temporal interaction learning based two-stream network for action recognition
Tianyu Liu ... Ping Jiang
Information Sciences | VOL. 606
Tianyu Liu, et. al.Tianyu Liu ... Ping Jiang
28 May 2022
Information Sciences | VOL. 606

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Dual-flow Spatio-temporal Separation Network for Lip Reading

Abstract

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series