Chain of Generation: Multi-Modal Gesture Synthesis via Cascaded Conditional Control

Zunnan Xu,Xiu Li,Ronghui Li,Sicheng Yang,Yachao Zhang

doi:10.1609/aaai.v38i6.28458

Abstract

This study aims to improve the generation of 3D gestures by utilizing multimodal information from human speech. Previous studies have focused on incorporating additional modalities to enhance the quality of generated gestures. However, these methods perform poorly when certain modalities are missing during inference. To address this problem, we suggest using speech-derived multimodal priors to improve gesture generation. We introduce a novel method that separates priors from speech and employs multimodal priors as constraints for generating gestures. Our approach utilizes a chain-like modeling method to generate facial blendshapes, body movements, and hand gestures sequentially. Specifically, we incorporate rhythm cues derived from facial deformation and stylization prior based on speech emotions, into the process of generating gestures. By incorporating multimodal priors, our method improves the quality of generated gestures and eliminate the need for expensive setup preparation during inference. Extensive experiments and user studies confirm that our proposed approach achieves state-of-the-art performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Chain of Generation: Multi-Modal Gesture Synthesis via Cascaded Conditional Control

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Mar 24, 2024
Citations: 2

Similar Papers

Application of Emotion Recognition and Modification for Emotional Telugu Speech Recognition
Vishnu Vidyadhara Raju Vegesna ... Krishna Gurugubelli
Mobile Networks and Applications | VOL. 24
Vishnu Vidyadhara Raju Vegesna, et. al.Vishnu Vidyadhara Raju Vegesna ... Krishna Gurugubelli
01 May 2018
Mobile Networks and Applications | VOL. 24

Comparison between five classification techniques for classifying emotions in human speech
Bageshree V Pathak ... Shweta D More
-
Bageshree V Pathak, et. al.Bageshree V Pathak ... Shweta D More
01 May 2019
01 May 2019

Parameter changes across different emotions in human speech
Tao Li ... Cheolwoo Jo
-
Tao Li, et. al. Tao Li ... Cheolwoo Jo
01 Jan 2004
01 Jan 2004

SECap: Speech Emotion Captioning with Large Language Model
Yaoxun Xu ... Zhiyong Wu
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 38
Yaoxun Xu, et. al.Yaoxun Xu ... Zhiyong Wu
24 Mar 2024
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 38

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Chain of Generation: Multi-Modal Gesture Synthesis via Cascaded Conditional Control

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence