Robust Video Portrait Reenactment via Personalized Representation Quantization

Kaisiyuan Wang,Qianyi Wu,Changcheng Liang,Ziwei Liu,Hang Zhou,Jingdong Wang,Dongliang He,Jiaxiang Tang,Zhibin Hong,Jingtuo Liu,Errui Ding

doi:10.1609/aaai.v37i2.25354

Abstract

While progress has been made in the field of portrait reenactment, the problem of how to produce high-fidelity and robust videos remains. Recent studies normally find it challenging to handle rarely seen target poses due to the limitation of source data. This paper proposes the Video Portrait via Non-local Quantization Modeling (VPNQ) framework, which produces pose- and disturbance-robust reenactable video portraits. Our key insight is to learn position-invariant quantized local patch representations and build a mapping between simple driving signals and local textures with non-local spatial-temporal modeling. Specifically, instead of learning a universal quantized codebook, we identify that a personalized one can be trained to preserve desired position-invariant local details better. Then, a simple representation of projected landmarks can be used as sufficient driving signals to avoid 3D rendering. Following, we employ a carefully designed Spatio-Temporal Transformer to predict reasonable and temporally consistent quantized tokens from the driving signal. The predicted codes can be decoded back to robust and high-quality videos. Comprehensive experiments have been conducted to validate the effectiveness of our approach.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Robust Video Portrait Reenactment via Personalized Representation Quantization

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Similar Papers

Neural Video Portrait Relighting in Real-time via Consistency Modeling
Longwen Zhang ... Lan Xu
-
Longwen Zhang, et. al.Longwen Zhang ... Lan Xu
01 Oct 2021
01 Oct 2021

Thermodynamic framework for non-local transport-damage modeling of fluid driven fracture in porous media
Mostafa E Mobasher ... Luc Berger-Vergiat
International Journal of Rock Mechanics and Mining Sciences | VOL. 111
Mostafa E Mobasher, et. al.Mostafa E Mobasher ... Luc Berger-Vergiat
10 Oct 2018
International Journal of Rock Mechanics and Mining Sciences | VOL. 111

Audio-Driven Emotional Video Portraits
Xinya Ji ... Feng Xu
-
Xinya Ji, et. al.Xinya Ji ... Feng Xu
01 Jun 2021
01 Jun 2021

Non-local degradation modeling for spatially adaptive single image super-resolution
Qianyu Zhang ... Shanxin Yuan
Neural Networks | VOL. 175
Qianyu Zhang, et. al.Qianyu Zhang ... Shanxin Yuan
10 Apr 2024
Neural Networks | VOL. 175

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Robust Video Portrait Reenactment via Personalized Representation Quantization

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence