Generating Diverse Gestures from Speech Using Memory Networks as Dynamic Dictionaries

Zeyu Zhao,Nan Gao,Shuwu Zhang,Zhi Zeng

doi:10.1109/cost57098.2022.00042

Abstract

People naturally enhance their speeches with body motion or gestures. Generating human gestures for digital humans or virtual avatars from speech audio or text remains challenging for its indeterministic nature. We observe that existing neural methods often give gestures with an inadequate amount of movement shift, which can be characterized as slow or dull. Thus, we propose a novel generative model coupled with memory networks to work as dynamic dictionaries for generating gestures with improved diversity. Under the hood of the proposed model, a dictionary network dynamically stores previously appeared pose features corresponding to text features for the generator to lookup, while a pose generation network takes in audio and pose features and outputs the resulting gesture sequences. Seed poses are utilized in the generation process to guarantee the continuity between two speech segments. We also propose a new objective evaluation metric for diversity of generated gestures and succeed in demonstrating that the proposed model has the ability to generate gestures with improved diversity.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Generating Diverse Gestures from Speech Using Memory Networks as Dynamic Dictionaries

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Improving diversity of speech‐driven gesture generation with memory networks as dynamic dictionaries
Zeyu Zhao ... Zhi Zeng
CAAI Transactions on Intelligence Technology | VOL. 9
Zeyu Zhao, et. al.Zeyu Zhao ... Zhi Zeng
22 Apr 2024
CAAI Transactions on Intelligence Technology | VOL. 9

MOVIN: Real‐time Motion Capture using a Single LiDAR
Deok‐Kyeong Jang ... Dongseok Yang
Computer Graphics Forum | VOL. 42
Deok‐Kyeong Jang, et. al.Deok‐Kyeong Jang ... Dongseok Yang
01 Oct 2023
Computer Graphics Forum | VOL. 42

Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates
Shenhan Qian ... Yihao Zhi
-
Shenhan Qian, et. al.Shenhan Qian ... Yihao Zhi
01 Oct 2021
01 Oct 2021

Classifying Alzheimer's Disease Using Audio and Text-Based Representations of Speech.
R'Mani Haulcy ... James Glass
Frontiers in Psychology | VOL. 11
R'Mani Haulcy, et. al.R'Mani Haulcy ... James Glass
15 Jan 2021
Frontiers in Psychology | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Generating Diverse Gestures from Speech Using Memory Networks as Dynamic Dictionaries

Abstract

Talk to us

Similar Papers