Abstract

Facial Action Unit (AU) intensity prediction is essential to facial expression analysis and emotion recognition, and thus has attracted much attention from the community. In comparison, AU localization, albeit being important to emotion visualization and tracking, was relatively unexplored. In addition, as most existing AU intensity prediction methods take a cropped face image as input, their actual speed at run-time is often penalized by the pre-processing steps such as face detection and alignment. At the same time, their performance (in terms of inference speed), also does not scale well to multi-face images. To alleviate these problems, we propose a joint AU intensity prediction and localization method that works directly on the whole input image, thus eliminating the need of any pre-processing step and achieving the same inference speed regardless of the number of faces in the image. Based on the observation that different relevancy exists between AU intensities categories, a flexible cost function is proposed. At inference time, we introduce a non-maximum intensity suppression model to refine the prediction result. In order to leverage existing datasets without AU region groundtruth, we also propose an automatic AU region labeling method. Experiments on two benchmark databases, DISFA and FERA2015, show that the proposed approach outperforms the state-of-the-art methods on three metrics, ICC, MAE and F1 for the AU intensity prediction task.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call