Expression-tailored talking face generation with adaptive cross-modal weighting

Dan Zeng,Shuaitao Zhao,Junjie Zhang,Han Liu,Kai Li

doi:10.1016/j.neucom.2022.09.025

Abstract

The key of talking face generation is to synthesize the identity-preserving natural facial expressions with accurate audio-lip synchronization. To accomplish this, it requires to disentangle and fuse the latent features from multiple modalities, including the visual identity, facial expressions, and audio, etc. In this paper, we propose an end-to-end Expression-Tailored Generative Adversarial Network with Adaptive Cross-modal Weighting (ET-GAN-ACW). Different from previous talking face generation based on the identity image and audio, an expression video of arbitrary identity serves as the source in our system. On the one hand, multiple encoders are presented to disentangle the expression-tailored representation, audio-lip embedding, and face position localization in parallel. Additionally, instead of using a single image as the target identity, a multi-image identity encoder is proposed by exploring the different views of faces and merging them into a unified representation. These informative features from different modalities are then adaptively weighted and fused by the proposed Adaptive Cross-modal Weighting (ACW) mechanism. On the other hand, multiple discriminators are exploited to create the image-aware and video-aware realistic details, including a frame discriminator for the frame authenticity, and a spatial–temporal discriminator for the visual coherence of facial expression movements. Extensive quantitative evaluations on reconstruction error, identity preserving, expression retention, and audio-visual synchronization verify the superiority of our method. Qualitative results also demonstrate the effectiveness of our method in generating high-quality talking face videos.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Expression-tailored talking face generation with adaptive cross-modal weighting

Abstract

Talk to us

Similar Papers

More From: Neurocomputing

Lead the way for us

Journal: Neurocomputing	Publication Date: Sep 9, 2022
Citations: 8

Similar Papers

Talking Face Generation with Expression-Tailored Generative Adversarial Network
Dan Zeng ... Shiming Ge
-
Dan Zeng, et. al.Dan Zeng ... Shiming Ge
12 Oct 2020
12 Oct 2020

Discriminative clustering on manifold for adaptive transductive classification
Zhao Zhang ... Fanzhang Li
Neural Networks | VOL. 94
Zhao Zhang, et. al.Zhao Zhang ... Fanzhang Li
01 Aug 2017
Neural Networks | VOL. 94

The representation of information about faces in the temporal and frontal lobes
Edmund T Rolls
Neuropsychologia | VOL. 45
Edmund T RollsEdmund T Rolls
23 Jun 2006
Neuropsychologia | VOL. 45

Facial expression recognition based on AAM–SIFT and adaptive regional weighting
Fuji Ren ... Zhong Huang
IEEJ Transactions on Electrical and Electronic Engineering | VOL. 10
Fuji Ren, et. al.Fuji Ren ... Zhong Huang
15 Sep 2015
IEEJ Transactions on Electrical and Electronic Engineering | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Expression-tailored talking face generation with adaptive cross-modal weighting

Abstract

Talk to us

Similar Papers

More From: Neurocomputing