Abstract

Facial action unit (AU) detection plays an important role in performing facial behavioral analysis of raw video inputs. Overall, there are three key factors that contribute toward the optimal performance of AU detectors: 1) being able to capture local AU-centered features; 2) exploiting the fact that some AUs co-occur with others; and 3) utilizing appearance changes across frames. We briefly review current techniques addressing each factor and discuss the challenges they meet. Given that very few works consider how to effectively and efficiently merge them all into a single framework that can be trained in an end-to-end manner, we propose (), a simple yet strong baseline for landmark-based AU detection. implements the abovementioned key factors by: 1) using the intermediate layers of a pretrained face alignment model to act as our AU features’ space; 2) optimized to satisfy a correlation constraint, derived from the AU labels, and 3) temporal constraint, derived from variations in the contents of consecutive frames in the input videos. The proposed model, with its three key components, remains simple in nature and aligns with the primary AU detection task. Experiments on several benchmarks show that it substantially improves the AU detector’s accuracy and achieves new state-of-the-art AU detection results on popular benchmarks: BP4D and DISFA. Code is available at https://github.com/jingyang2017/AU-Net.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call