ZeroEGGS: Zero‐shot Example‐based Gesture Generation from Speech

Saeed Ghorbani,Marc‐André Carbonneau,Ylva Ferstl,Daniel Holden,Nikolaus F Troje

doi:10.1111/cgf.14734

Saeed Ghorbani, Marc‐André Carbonneau + Show 3 more

Open Access

https://doi.org/10.1111/cgf.14734

Copy DOI

Journal: Computer Graphics Forum	Publication Date: Feb 1, 2023
Citations: 16	License type: CC BY-NC-ND 4.0

Affiliation: Ubisoft (Canada), York University

Abstract

AbstractWe present ZeroEGGS, a neural network framework for speech‐driven gesture generation with zero‐shot style control by example. This means style can be controlled via only a short example motion clip, even for motion styles unseen during training. Our model uses a Variational framework to learn a style embedding, making it easy to modify style through latent space manipulation or blending and scaling of style embeddings. The probabilistic nature of our framework further enables the generation of a variety of outputs given the input, addressing the stochastic nature of gesture motion. In a series of experiments, we first demonstrate the flexibility and generalizability of our model to new speakers and styles. In a user study, we then show that our model outperforms previous state‐of‐the‐art techniques in naturalness of motion, appropriateness for speech, and style portrayal. Finally, we release a high‐quality dataset of full‐body gesture motion including fingers, with speech, spanning across 19 different styles. Our code and data are publicly available at https://github.com/ubisoft/ubisoft‐laforge‐ZeroEGGS.

Full Text