Abstract

Recent years have witnessed significant progress in image-based 3D face reconstruction using deep convolutional neural networks. However, current reconstruction methods often perform improperly in self-occluded regions and can lead to inaccurate correspondences between a 2D input image and a 3D face template, hindering use in real applications. To address these problems, we propose a deep shape reconstruction and texture completion network, SRTC-Net, which jointly reconstructs 3D facial geometry and completes texture with correspondences from a single input face image. In SRTC-Net, we leverage the geometric cues from completed 3D texture to reconstruct detailed structures of 3D shapes. The SRTC-Net pipeline has three stages. The first introduces a correspondence network to identify pixel-wise correspondence between the input 2D image and a 3D template model, and transfers the input 2D image to a U-V texture map. Then we complete the invisible and occluded areas in the U-V texture map using an inpainting network. To get the 3D facial geometries, we predict coarse shape (U-V position maps) from the segmented face from the correspondence network using a shape network, and then refine the 3D coarse shape by regressing the U-V displacement map from the completed U-V texture map in a pixel-to-pixel way. We examine our methods on 3D reconstruction tasks as well as face frontalization and pose invariant face recognition tasks, using both in-the-lab datasets (MICC, MultiPIE) and in-the-wild datasets (CFP). The qualitative and quantitative results demonstrate the effectiveness of our methods on inferring 3D facial geometry and complete texture; they outperform or are comparable to the state-of-the-art.

Highlights

  • This paper attacks the problem of recovering 3D shape and complete facial texture from a single 2D face image

  • This paper has proposed the deep SRTC-Net to attack the challenging problem of inferring 3D facial geometry and texture from a single image

  • Our method consists of three subnetworks, the corresponding network (C-Net), inpainting network (I-Net), and shape network (S-Net), which decompose the hard problem into several more tractable problems

Read more

Summary

Introduction

This paper attacks the problem of recovering 3D shape and complete facial texture from a single 2D face image. Pixel-wise regression methods [17,18,19] directly predict the depth or 3D locations of each pixel in the input, and can be effectively implemented with modern deep networks, e.g., U-Net [20] Their pixel-wise nature enables these methods to capture fine facial details, but these methods lack the dense correspondence between 2D input image and 3D facial template, which may be limiting in practical applications like expression transfer and animation. These methods cannot handle self-occluded regions well

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call