Abstract

This work proposes a novel convolutional neural network architecture which can locate landmarks accurately by learning local responses of facial landmarks. The network consists of a Conditional Variational Auto-Encoder(CVAE) and a Deep Convolutional Neural Network(DCNN). The CVAE is used to learn the response maps of facial landmarks from face images and the DCNN is used to learn accurate landmark locations from the response maps and facial textures. The CVAE consists of a face encoder, which extracts high-level information from raw pixels, and a decoder which outputs local response maps from high-level coding. We derive the CVAE used for catching local responses as an optimization problem, which can be solved through back-propagation. Extensive experiments show that the proposed CVAE can learn better local response maps than Fully Convolutional Network(FCN). Our method outperforms state-of-the-art methods on AFLW(5 points) and the challenging subset of 300-W(68 points), which means our method shows advantages in the condition of complex poses and expressions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call