Abstract
Depth images are widely used in 3D head pose estimation and face reconstruction. The device-specific noise and the lack of textual constraints pose a major problem for estimating a nonrigid deformable face from a single noisy depth image. In this article, we present a deep neural network-based framework to infer a 3D face consistent with a single depth image captured by a consumer depth camera Kinect. Confronted with a lack of annotated depth images with facial parameters, we utilize the bidirectional CycleGAN-based generator for denoising and noisy image simulation, which helps to generalize the model learned from synthetic depth images to real noisy ones. We generate the code regressors in the source (synthetic) and the target (noisy) depth image domains and present a fusion scheme in the parametric space for 3D face inference. The proposed multi-level shape consistency constraint, concerning the embedded features, depth maps, and 3D surfaces, couples the code regressor and the domain adaptation, avoiding shape distortions in the CycleGAN-based generators. Experiments demonstrate that the proposed method is effective in depth-based 3D head pose estimation and expressive face reconstruction compared with the state-of-the-art.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have