Abstract

Facial action units (AUs) relate to specific local facial regions. Recent efforts in automated AU detection have focused on learning the facial patch representations to detect specific AUs. These efforts have encountered three hurdles. First, they implicitly assume that facial patches are robust to head rotation; yet non-frontal rotation is common. Second, mappings between AUs and patches are defined a priori, which ignores co-occurrences among AUs. And third, the dynamics of AUs are either ignored or modeled sequentially rather than simultaneously as in human perception. Inspired by recent advances in human perception, we propose a dynamic patch-attentive deep network, called D-PAttNet, for AU detection that (i) controls for 3D head and face rotation, (ii) learns mappings of patches to AUs, and (iii) models spatiotemporal dynamics. D-PAttNet approach significantly improves upon existing state of the art.

Highlights

  • Facial actions communicate intention, emotion, and physical state (Tian et al, 2001)

  • We compare the performance of D-Patch-attentive deep network (PAttNet) with the following state-of-the-art approaches: Linear SVM (LSVM) is based on training an SVM classifier using the SIFT features obtained from the frames without considering patch learning

  • Deep region and multilabel learning (DRML) (Zhao et al, 2016b) combines region learning and multilabel learning for action units (AUs) detection

Read more

Summary

Introduction

Emotion, and physical state (Tian et al, 2001). The STS is sensitive to facial dynamics and involves the representation of changeable aspects of faces such as expression, lip movement, and eye gaze (Hoffman and Haxby, 2000). The anatomical location of OFA suggests that it provides input to both the FFA and STS. This system is consistent with hierarchical models (Grill-Spector and Malach, 2004; Fairhall and Ishai, 2006) that propose that complex visual objects are recognized via a series of stages in which features of increasing complexity are extracted and analyzed at progressively higher levels of the visual processing stream (Pitcher et al, 2011). The success of many human-inspired approaches in machine learning urges the following question: Can we model machine perception of facial actions with a hierarchical system analogous to the suggested models of human perception of faces and facial action?

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call