Live speech portraits

Yuanxun Lu,Xun Cao,Jinxiang Chai

doi:10.1145/3478513.3480484

Abstract

To the best of our knowledge, we first present a live system that generates personalized photorealistic talking-head animation only driven by audio signals at over 30 fps. Our system contains three stages. The first stage is a deep neural network that extracts deep audio features along with a manifold projection to project the features to the target person's speech space. In the second stage, we learn facial dynamics and motions from the projected audio features. The predicted motions include head poses and upper body motions, where the former is generated by an autoregressive probabilistic model which models the head pose distribution of the target person. Upper body motions are deduced from head poses. In the final stage, we generate conditional feature maps from previous predictions and send them with a candidate image set to an image-to-image translation network to synthesize photorealistic renderings. Our method generalizes well to wild audio and successfully synthesizes high-fidelity personalized facial details, e.g., wrinkles, teeth. Our method also allows explicit control of head poses. Extensive qualitative and quantitative evaluations, along with user studies, demonstrate the superiority of our method over state-of-the-art techniques.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Live speech portraits

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Graphics

Lead the way for us

Journal: ACM Transactions on Graphics	Publication Date: Dec 1, 2021
Citations: 85

Similar Papers

Joint Head Tracking and Pose Estimation for Visual Focus of Attention Recognition

-

01 Jan 2007
01 Jan 2007

Automatic Detection of Mind Wandering from Video in the Lab and in the Classroom
Nigel Bosch ... Sidney K D'Mello
IEEE Transactions on Affective Computing | VOL. 12
Nigel Bosch, et. al.Nigel Bosch ... Sidney K D'Mello
01 Oct 2021
IEEE Transactions on Affective Computing | VOL. 12

End-to-end multimodal clinical depression recognition using deep neural networks: A comparative analysis
Muhammad Muzammel ... Alice Othmani
Computer Methods and Programs in Biomedicine | VOL. 211
Muhammad Muzammel, et. al.Muhammad Muzammel ... Alice Othmani
28 Sep 2021
Computer Methods and Programs in Biomedicine | VOL. 211

Learning 3D Head Pose From Synthetic Data: A Semi-Supervised Approach
Shubhajit Basak ... Michael Schukat
IEEE Access | VOL. 9
Shubhajit Basak, et. al.Shubhajit Basak ... Michael Schukat
01 Jan 2020
IEEE Access | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Live speech portraits

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Graphics