Deep Gesture Video Generation With Learning on Regions of Interest

Runpeng Cui,Zhong Cao,Jianqiang Wang,Changshui Zhang,Weishen Pan

doi:10.1109/tmm.2019.2960700

Abstract

Generating videos with semantic meaning, such as gestures in sign language, is a challenging problem. The model should not only learn to generate videos with realistic appearance, but also take notice of crucial details in frames to convey precise information. In this paper, we focus on the problem of generating long-term gesture videos containing precise and complete semantic meanings. We develop a novel architecture to learn the temporal and spatial transforms in regions of interest, i.e., gesticulating hands or face in our case. We adopt a hierarchical approach for generating gesture videos, by first making predictions on future pose configurations, and then using the encoder-decoder architecture to synthesize future frames based on the predicted pose structures. We develop the scheme of action progress in our architecture to represent how far the action has been performed during its expected execution, and to instruct our model to synthesize actions with various paces. Our approach is evaluated on two challenging datasets for the task of gesture video generation. Experimental results show that our method can produce gesture videos with more realistic appearance and precise meaning than the state-of-the-art video generation approaches.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Deep Gesture Video Generation With Learning on Regions of Interest

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Multimedia

Lead the way for us

Journal: IEEE Transactions on Multimedia	Publication Date: Dec 26, 2019
Citations: 60

Similar Papers

Recognizing and Interpreting Sign Language Gesture for Human Robot Interaction
Shekhar Singh ... Deepak Kumar
International Journal of Computer Applications | VOL. 52
Shekhar Singh, et. al.Shekhar Singh ... Deepak Kumar
30 Aug 2012
International Journal of Computer Applications | VOL. 52

An Optical Flow Based Approach to Detect Movement Epenthesis in Continuous Fingerspelling of Sign Language
Navneet Nayan ... P M Pradhan
-
Navneet Nayan, et. al.Navneet Nayan ... P M Pradhan
27 Jul 2021
27 Jul 2021

Gloss-driven Conditional Diffusion Models for Sign Language Production
Shengeng Tang ... Feng Xue
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. -
Shengeng Tang, et. al.Shengeng Tang ... Feng Xue
03 May 2024
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. -

Deep SLRT: The Development of Deep Learning based Multilingual and Multimodal Sign Language Recognition and Translation Framework
Natarajan Balasubramanian ... Elakkiya Rajasekar
-
Natarajan Balasubramanian, et. al.Natarajan Balasubramanian ... Elakkiya Rajasekar
20 Aug 2023
20 Aug 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deep Gesture Video Generation With Learning on Regions of Interest

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Multimedia