Progressive Transformer Machine for Natural Character Reenactment

Yongzong Xu,Zhijing Yang,Chunmei Qing,Tianshui Chen,Kai Li

doi:10.1145/3559107

Abstract

Character reenactment aims to control a target person’s full-head movement by a driving monocular sequence that is made up of the driving character video. Current algorithms utilize convolution neural networks in generative adversarial networks, which extract historical and geometric information to iteratively generate video frames. However, convolution neural networks can merely capture local information with limited receptive fields and ignore global dependencies that play a crucial role in face synthesis, leading to generating unnatural video frames. In this work, we design a progressive transformer module that introduces multi-head self-attention with convolution refinement to simultaneously capture global-local dependencies. Specifically, we utilize the non-lapping window-based multi-head self-attention mechanism with hierarchical architecture to obtain the larger receptive fields at low-resolution feature map and thus extract global information. To better model local dependencies, we introduce the convolution operation to further refine the attentional weight in the multi-head self-attention mechanism. Finally, we use several stacked progressive transformer modules with the down-sampling operation to encode information of appearance information of previously generated frames and parameterized 3D face information of the current frame. Similarly, we use several stacked progressive transformer modules with the up-sampling operation to iteratively generate video frames. In this way, it can capture global-local information to facilitate generating video frames that are globally natural while preserving sharp outlines and rich detail information. Extensive experiments on several standard benchmarks suggest that the proposed method outperforms current leading algorithms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Progressive Transformer Machine for Natural Character Reenactment

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Multimedia Computing, Communications, and Applications

Lead the way for us

Journal: ACM Transactions on Multimedia Computing, Communications, and Applications	Publication Date: Feb 17, 2023
Citations: 3

Similar Papers

HSI-BERT: Hyperspectral Image Classification Using the Bidirectional Encoder Representation From Transformers
Ji He ... Mengmeng Zhang
IEEE Transactions on Geoscience and Remote Sensing | VOL. 58
Ji He, et. al.Ji He ... Mengmeng Zhang
12 Sep 2019
IEEE Transactions on Geoscience and Remote Sensing | VOL. 58

Lighting Enhancement Using Self-attention Guided HDR Reconstruction
Shupei Zhang ... Anup Basu
-
Shupei Zhang, et. al.Shupei Zhang ... Anup Basu
01 Jan 2021
01 Jan 2021

Multi-scale deep neural network approach with attention mechanism for remaining useful life estimation
Ahmet Kara
Computers & Industrial Engineering | VOL. 169
Ahmet KaraAhmet Kara
30 Apr 2022
Computers & Industrial Engineering | VOL. 169

A malicious network traffic detection model based on bidirectional temporal convolutional network with multi-head self-attention mechanism
Saihua Cai ... Guofeng Zhang
Computers & Security | VOL. 136
Saihua Cai, et. al.Saihua Cai ... Guofeng Zhang
04 Nov 2023
Computers & Security | VOL. 136

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Progressive Transformer Machine for Natural Character Reenactment

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Multimedia Computing, Communications, and Applications