Abstract
Pose-invariant face recognition refers to the problem of identifying or verifying a person by analyzing face images captured from different poses. This problem is challenging due to the large variation of pose, illumination and facial expression. A promising approach to deal with pose variation is to fulfill incomplete UV maps extracted from in-the-wild faces, then attach the completed UV map to a fitted 3D mesh and finally generate different 2D faces of arbitrary poses. The synthesized faces increase the pose variation for training deep face recognition models and reduce the pose discrepancy during the testing phase. In this paper, we propose a novel generative model called Attention ResCUNet-GAN to improve the UV map completion. We enhance the original UV-GAN by using a couple of U-Nets. Particularly, the skip connections within each U-Net are boosted by attention gates. Meanwhile, the features from two U-Nets are fused with trainable scalar weights. The experiments on the popular benchmarks, including Multi-PIE, LFW, CPLWF and CFP datasets, show that the proposed method yields superior performance compared to other existing methods.
Highlights
Face recognition has gained much attention for decades [1,2,3]
Pose-invariant face recognition refers to the problem of identifying or verifying a person by analyzing face images captured from different poses
One weakness of the original UV-generative adversarial networks (GANs) is the plain architecture of the generator, which is shown to be worse than residual networks [40]
Summary
Face recognition has gained much attention for decades [1,2,3]. Contrary to other popular biometrics, face recognition can be applied to uncooperative subjects in a non-instructive manner. In [35], Deng et al propose an adversarial UV map completion framework called UV-GAN to solve poseinvariant face recognition without the need of extensive pose coverage in the training dataset. One weakness of the original UV-GAN is the plain architecture of the generator, which is shown to be worse than residual networks [40] Another weakness is that one U-Net block seems to be not enough to mix well low-level information in the encoder with high-level semantic features in the decoder. In [41], Deng et al use UV-GAN with similar architecture as in [35] to extract side information as well as subspaces, and combine UV-GAN with robust PCA for the face recognition task He et al [42] introduce a framework for heterogeneous face synthesis from near-infrared (NIR) to visible domain.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have