Abstract
Facial landmark detection has gained enormous interest for face-related applications due to its success in facial analysis tasks such as facial recognition, cartoon generation, face tracking and facial expression analysis. Many studies have been proposed and implemented to deal with the challenging problems of localizing facial landmarks from given images, including large appearance variations and partial occlusion. Studies have differed in the way they use the facial appearances and shape information of input images. In our work, we consider facial information within both global and local contexts. We aim to obtain local pixel-level accuracy for local-context information in the first stage and integrate this with knowledge of spatial relationships between each key point in a whole image for global-context information in the second stage. Thus, the pipeline of our architecture consists of two main components: (1) a deep network for local-context subnet that generates detection heatmaps via fully convolutional DenseNets with additional kernel convolution filters and (2) a dilated skip convolution subnet—a combination of dilated convolutions and skip-connections networks—that are in charge of robustly refining the local appearance heatmaps. Through this proposed architecture, we demonstrate that our approach achieves state-of-the-art performance on challenging datasets—including LFPW, HELEN, 300W and AFLW2000-3D—by leveraging fully convolutional DenseNets, skip-connections and dilated convolution architecture without further post-processing.
Highlights
In computer vision, facial landmark detection is known as face alignment and is a crucial part of face recognition operations
local appearance initialization (LAI) pursues a heatmap regression approach convolved with kernel convolution to serve as a local detector of facial landmarks and the dilated skip convolution (DSC) subnet is designed to refine the local prediction of the first subnet
We evaluated for accuracy with three popular metrics: the normalized mean error (NME), the cumulative error distribution (CED) curve and the area under the curve (AUC)
Summary
Facial landmark detection is known as face alignment and is a crucial part of face recognition operations. Its algorithms attempt to predict the locations of the fiducial facial landmark coordinates that vary owing to head movements and facial expressions. These landmarks are located at major parts of the face, such as the contours, tip of the nose, chin, eyes, corners of the mouth (see [1] in review). Facial landmark detection has sparked much interest recently as it is a prerequisite in many computer vision applications, including facial recognition [2], facial emotion recognition [3,4], face morphing [2,5], 3D face modelling [6] and human-computer interactions [7]. Facial landmark detection is classified into three types of methods: holistic, constrained local model (CLM), and regression-based.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.