Audio2Gestures: Generating Diverse Gestures From Audio.

Jing Li,Xuefei Zhe,Ying Zhang,Zhenyu He,Linchao Bao,Di Kang,Wenjie Pei

doi:10.1109/tvcg.2023.3276973

Abstract

People may perform diverse gestures affected by various mental and physical factors when speaking the same sentences. This inherent one-to-many relationship makes co-speech gesture generation from audio particularly challenging. Conventional CNNs/RNNs assume one-to-one mapping, and thus tend to predict the average of all possible target motions, easily resulting in plain/boring motions during inference. So we propose to explicitly model the one-to-many audio-to-motion mapping by splitting the cross-modal latent code into shared code and motion-specific code. The shared code is expected to be responsible for the motion component that is more correlated to the audio while the motion-specific code is expected to capture diverse motion information that is more independent of the audio. However, splitting the latent code into two parts poses extra training difficulties. Several crucial training losses/strategies, including relaxed motion loss, bicycle constraint, and diversity loss, are designed to better train the VAE. Experiments on both 3D and 2D motion datasets verify that our method generates more realistic and diverse motions than previous state-of-the-art methods, quantitatively and qualitatively. Besides, our formulation is compatible with discrete cosine transformation (DCT) modeling and other popular backbones (i.e., RNN, Transformer). As for motion losses and quantitative motion evaluation, we find structured losses/metrics (e.g. STFT) that consider temporal and/or spatial context complement the most commonly used point-wise losses (e.g. PCK), resulting in better motion dynamics and more nuanced motion details. Finally, we demonstrate that our method can be readily used to generate motion sequences with user-specified motion clips on the timeline.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Audio2Gestures: Generating Diverse Gestures From Audio.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on visualization and computer graphics

Lead the way for us

Journal: IEEE transactions on visualization and computer graphics	Publication Date: Aug 1, 2024
Citations: 2

Similar Papers

Audio2Gestures: Generating Diverse Gestures from Speech Audio with Conditional Variational Autoencoders
Jing Li ... Di Kang
-
Jing Li, et. al.Jing Li ... Di Kang
01 Oct 2021
01 Oct 2021

Using Respiratory Motion to Guide Planning Target Volume Margins for External Beam Partial Breast Irradiation
Leonard H Kim ... Ning J Yue
International Journal of Radiation Oncology, Biology, Physics | VOL. 82
Leonard H Kim, et. al.Leonard H Kim ... Ning J Yue
01 Mar 2012
International Journal of Radiation Oncology, Biology, Physics | VOL. 82

Factors affecting disability in patients attending the internal medicine departments of general hospitals.
Misako Sata ... Toshihiro Ohtsuka
Psychiatry and Clinical Neurosciences | VOL. 53
Misako Sata, et. al.Misako Sata ... Toshihiro Ohtsuka
01 Dec 1999
Psychiatry and Clinical Neurosciences | VOL. 53

Sensitivity analysis of physical and mental health factors affecting Polycystic ovary syndrome in women
Srirupa Guha ... Ashwini Kodipalli
Expert Systems | VOL. -
Srirupa Guha, et. al.Srirupa Guha ... Ashwini Kodipalli
26 Jul 2023
Expert Systems | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Audio2Gestures: Generating Diverse Gestures From Audio.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on visualization and computer graphics