Abstract
In this article, we present the Menpo 2D and Menpo 3D benchmarks, two new datasets for multi-pose 2D and 3D facial landmark localisation and tracking. In contrast to the previous benchmarks such as 300W and 300VW, the proposed benchmarks contain facial images in both semi-frontal and profile pose. We introduce an elaborate semi-automatic methodology for providing high-quality annotations for both the Menpo 2D and Menpo 3D benchmarks. In Menpo 2D benchmark, different visible landmark configurations are designed for semi-frontal and profile faces, thus making the 2D face alignment full-pose. In Menpo 3D benchmark, a united landmark configuration is designed for both semi-frontal and profile faces based on the correspondence with a 3D face model, thus making face alignment not only full-pose but also corresponding to the real-world 3D space. Based on the considerable number of annotated images, we organised Menpo 2D Challenge and Menpo 3D Challenge for face alignment under large pose variations in conjunction with CVPR 2017 and ICCV 2017, respectively. The results of these challenges demonstrate that recent deep learning architectures, when trained with the abundant data, lead to excellent results. We also provide a very simple, yet effective solution, named Cascade Multi-view Hourglass Model, to 2D and 3D face alignment. In our method, we take advantage of all 2D and 3D facial landmark annotations in a joint way. We not only capitalise on the correspondences between the semi-frontal and profile 2D facial landmarks but also employ joint supervision from both 2D and 3D facial landmarks. Finally, we discuss future directions on the topic of face alignment.
Highlights
Facial landmark localisation and tracking on images and videos captured in unconstrained recording conditions is a problem that has received a lot of attention during the past few years
The method of Yang et al (2017) was the best performing method in both semi-frontal and profile categories and is the winner of the competition. As it is customary in landmark evaluation papers (Sagonas et al 2013, 2016), we provide performance graphs excluding the boundary landmarks for both semi-frontal and profile faces
The Menpo 2D dataset provides different landmark configurations for semi-frontal and profile faces based on the visible landmarks, making the 2D face alignment full-pose
Summary
Facial landmark localisation and tracking on images and videos captured in unconstrained recording conditions is a problem that has received a lot of attention during the past few years. Methodologies (Xiong and De la Torre 2013; Ren et al 2014; Zhu et al 2016a; Trigeorgis et al 2016; Güler et al 2017; Bulat and Tzimiropoulos 2017a, b; Honari et al 2018) that achieve good performance in facial landmark localisation have been presented in recent top-tier computer vision conferences (e.g., CVPR, ICCV, ECCV) This progress would not be feasible without the efforts made by the scientific community to design and develop both benchmarks with high-quality landmark annotations (Sagonas et al 2013, 2016; Belhumeur et al 2013; Le et al 2012; Zhu and Ramanan 2012; Köstinger et al 2011), as well as rigorous protocols for performance assessment. It provides publicly available annotations for images, which originate from the following “in-the-wild” datasets:
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have