Abstract

Automatic Facial Action Units (AUs) detection is the recognition of the facial appearance changes caused by the contraction or relaxation of one or more related facial muscles. Compared to the sequence-based methods, a decreased performance is observed for the static image-based AU detection, due to the loss of temporal information. To solve this problem, we propose a novel method that implicitly learns temporal information from a single image for AU detection by adding a hidden optical-flow layer to concatenate two Convolutional Neural Networks (CNNs) models: optical-flow net (OF-Net) and AU detection net (AU-Net). The OF-Net is designed to estimate the facial appearance changes (optical flow) from a single input image through unsupervised learning. The AU-Net accepts the estimated optical-flow as input and predicts the AU occurrence. By training both OF-Net and AU-Net jointly, our model achieves better performance than training them separately, as the AU-Net provides semantic constraints for the optical-flow learning and helps generate more meaningful optical-flow. In return, the estimated optical-flow, which reflects facial appearance changes, benefits the AU-Net. Our proposed method has been evaluated on two benchmarks: BP4D and DISFA, and the experiments show significant performance improvement as compared to the state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call