Robust facial landmark detection and tracking across poses and expressions for in-the-wild monocular video

Shuang Liu,Yongqiang Zhang,Daming Shi,Jian J Zhang,Xiaosong Yang

doi:10.1007/s41095-016-0068-y

Shuang Liu, Yongqiang Zhang + Show 3 more

Open Access

https://doi.org/10.1007/s41095-016-0068-y

Copy DOI

Abstract

We present a novel approach for automatically detecting and tracking facial landmarks across poses and expressions from in-the-wild monocular video data, e.g., YouTube videos and smartphone recordings. Our method does not require any calibration or manual adjustment for new individual input videos or actors. Firstly, we propose a method of robust 2D facial landmark detection across poses, by combining shape-face canonical-correlation analysis with a global supervised descent method. Since 2D regression-based methods are sensitive to unstable initialization, and the temporal and spatial coherence of videos is ignored, we utilize a coarse-todense 3D facial expression reconstruction method to refine the 2D landmarks. On one side, we employ an in-the-wild method to extract the coarse reconstruction result and its corresponding texture using the detected sparse facial landmarks, followed by robust pose, expression, and identity estimation. On the other side, to obtain dense reconstruction results, we give a face tracking flow method that corrects coarse reconstruction results and tracks weakly textured areas; this is used to iteratively update the coarse face model. Finally, a dense reconstruction result is estimated after it converges. Extensive experiments on a variety of video sequences recorded by ourselves or downloaded from YouTube show the results of facial landmark detection and tracking under various lighting conditions, for various head poses and facial expressions. The overall performance and a comparison with state-of-art methods demonstrate the robustness and effectiveness of our method.

Highlights

Across poses and expressions from in-the-wild monocular video data, e.g., YouTube videos and smartphone recordings
To reduce the amount of on a variety of video sequences recorded by ourselves manual labor, an ideal face capture solution should or downloaded from YouTube show the results of automatically provide the facial shape facial landmark detection and tracking under various with high performance given reasonable quality lighting conditions, for various head poses and facial input videos
We have proposed a novel fully automated method for robust facial landmark detection and tracking across poses and expressions for in-thewild monocular videos

Summary

Introduction

Across poses and expressions from in-the-wild monocular video data, e.g., YouTube videos and smartphone recordings. We employ an in-the-wild method to extract the coarse reconstruction result and its corresponding texture using the detected sparse facial landmarks, followed by robust pose, expression, and identity estimation. Cao et al [16] extended the 3D dynamic expression model to work with even monocular video, with improved performance of facial landmark detection and tracking Their methods work well with indoor videos for a range of expressions, but tend to fail for videos captured in the wild (ITW) due to uncontrollable lighting, varying backgrounds, and partial occlusions. It is sensitive to shadows, light variations, and occlusion, which makes it difficult to apply in noisy uncontrolled environments To this end, we have designed a new ITW facial landmark detection and tracking method that employs optical flow to enhance the expressiveness of captured facial landmarks. Our contributions are three fold: A novel robust 2D facial landmark detection method which works across a range of poses, based on combining shape-face CCA with SDM. A novel 3D facial optical flow tracking method for m f

Objectives

Methods

Conclusion