Gesture Generation Research Articles

Extended reality (XR) systems are about to be integrated into our daily lives and will provide support in a variety of fields such as education and coaching. Enhancing user experience demands agents that are capable of displaying realistic affective and social behaviors within these systems, and, as a prerequisite, with the capability of understanding their interaction partner and responding appropriately. Based on our literature review of recent works published in the field of co-speech gesture generation, researchers have developed complex models capable of generating gestures characterized by a high level of human-likeness and speaker appropriateness. Nevertheless, this is only true in settings where the agent has an active status (i.e., the agent acts as the speaker), or it is delivering a monologue in a non-interactive setting. However, as illustrated in multiple works and competitions like the GENEA Challenge, these models remain inadequate in generating interlocutor-aware gestures. We consider interlocutor-aware gesture generation the process of displaying gestures that take into account the conversation partner’s behavior. Moreover, in settings where the agent is the listener, generated gestures lack the level of naturalness that we expect from a face-to-face conversation. To overcome these issues, we have designed a pipeline, called TAG2G, composed of a diffusion model, which was demonstrated to be a stable and powerful tool in gesture generation, and a vector-quantized variational auto-encoder (VQVAE), widely employed to produce meaningful gesture embeddings. Refocusing from monadic to dyadic multimodal input settings (i.e., taking into account text, audio, and previous gestures of both participants of a conversation) allows us to explore and infer the complex interaction mechanisms that lie in a balanced two-sided conversation. As per our results, a multi-agent conversational input setup improves the generated gestures’ appropriateness with respect to the conversational counterparts. Conversely, when the agent is speaking, a monadic approach performs better in terms of the generated gestures’ appropriateness in relation to the speech.

Read full abstract

AbstractGestures that accompany speech are an essential part of natural and efficient embodied human communication. The automatic generation of such co‐speech gestures is a long‐standing problem in computer animation and is considered an enabling technology for creating believable characters in film, games, and virtual social spaces, as well as for interaction with social robots. The problem is made challenging by the idiosyncratic and non‐periodic nature of human co‐speech gesture motion, and by the great diversity of communicative functions that gestures encompass. The field of gesture generation has seen surging interest in the last few years, owing to the emergence of more and larger datasets of human gesture motion, combined with strides in deep‐learning‐based generative models that benefit from the growing availability of data. This review article summarizes co‐speech gesture generation research, with a particular focus on deep generative models. First, we articulate the theory describing human gesticulation and how it complements speech. Next, we briefly discuss rule‐based and classical statistical gesture synthesis, before delving into deep learning approaches. We employ the choice of input modalities as an organizing principle, examining systems that generate gestures from audio, text and non‐linguistic input. Concurrent with the exposition of deep learning approaches, we chronicle the evolution of the related training data sets in terms of size, diversity, motion quality, and collection method (e.g., optical motion capture or pose estimation from video). Finally, we identify key research challenges in gesture generation, including data availability and quality; producing human‐like motion; grounding the gesture in the co‐occurring speech in interaction with other speakers, and in the environment; performing gesture evaluation; and integration of gesture synthesis into applications. We highlight recent approaches to tackling the various key challenges, as well as the limitations of these approaches, and point toward areas of future development.

Read full abstract

Gesture Generation Research Articles

Related Topics

Articles published on Gesture Generation

Personality Expression using Co-speech Gesture

TAG2G: A Diffusion-Based Approach to Interlocutor-Aware Co-Speech Gesture Generation

Editable Co-Speech Gesture Synthesis Enhanced with Individual Representative Gestures

Audio2Gestures: Generating Diverse Gestures From Audio.

Evaluating Gesture Generation in a Large-scale Open Challenge: The GENEA Challenge 2022

Priming users with babies’ gestures: Investigating the influences of priming with different development origin of image schemas in gesture elicitation study

Learning hierarchical discrete prior for co-speech gesture generation

DiT-Gesture: A Speech-Only Approach to Stylized Gesture Generation

Improving diversity of speech‐driven gesture generation with memory networks as dynamic dictionaries

Gesture generation by the robotic hand for aiding speech and hard of hearing persons based on indian sign language

Chain of Generation: Multi-Modal Gesture Synthesis via Cascaded Conditional Control

Speech-Driven Personalized Gesture Synthetics: Harnessing Automatic Fuzzy Feature Inference.

Cross-Modal Quantization for Co-Speech Gesture Generation

It takes two, not one: context-aware nonverbal behaviour generation in dyadic interactions

Verbal and nonverbal fluency in amyotrophic lateral sclerosis.

Extrovert or Introvert? GAN-Based Humanoid Upper-Body Gesture Generation for Different Impressions

ACT2G

GestureDiffuCLIP: Gesture Diffusion Model with CLIP Latents

Zero-shot style transfer for gesture animation driven by text and speech using adversarial disentanglement of multimodal style encoding.

A Comprehensive Review of Data‐Driven Co‐Speech Gesture Generation

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Gesture Generation Research Articles

Related Topics

Articles published on Gesture Generation

Personality Expression using Co-speech Gesture

TAG2G: A Diffusion-Based Approach to Interlocutor-Aware Co-Speech Gesture Generation

Editable Co-Speech Gesture Synthesis Enhanced with Individual Representative Gestures

Audio2Gestures: Generating Diverse Gestures From Audio.

Evaluating Gesture Generation in a Large-scale Open Challenge: The GENEA Challenge 2022

Priming users with babies’ gestures: Investigating the influences of priming with different development origin of image schemas in gesture elicitation study

Learning hierarchical discrete prior for co-speech gesture generation

DiT-Gesture: A Speech-Only Approach to Stylized Gesture Generation

Improving diversity of speech‐driven gesture generation with memory networks as dynamic dictionaries

Gesture generation by the robotic hand for aiding speech and hard of hearing persons based on indian sign language

Chain of Generation: Multi-Modal Gesture Synthesis via Cascaded Conditional Control

Speech-Driven Personalized Gesture Synthetics: Harnessing Automatic Fuzzy Feature Inference.

Cross-Modal Quantization for Co-Speech Gesture Generation

It takes two, not one: context-aware nonverbal behaviour generation in dyadic interactions

Verbal and nonverbal fluency in amyotrophic lateral sclerosis.

Extrovert or Introvert? GAN-Based Humanoid Upper-Body Gesture Generation for Different Impressions

ACT2G

GestureDiffuCLIP: Gesture Diffusion Model with CLIP Latents

Zero-shot style transfer for gesture animation driven by text and speech using adversarial disentanglement of multimodal style encoding.

A Comprehensive Review of Data‐Driven Co‐Speech Gesture Generation