Integrating Semantic and Temporal Relationships in Facial Action Unit Detection

Zhihua Li,Xiang Deng,Lijun Yin,Xiaotian Li

doi:10.1145/3474085.3475686

Zhihua Li, Xiang Deng + Show 2 more

Open Access

PDF Available

https://doi.org/10.1145/3474085.3475686

Copy DOI

Export

Save

Cite

Publication Date: Oct 17, 2021

Citations: 5

Affiliation: Binghamton University

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Facial action unit (AU) detection is a challenging task due to the variety and subtlety of individuals' facial behavior. Facial muscle characteristics such as temporal dependencies and action correlations make AU detection differ from general multi-label classification tasks, and capturing these two characteristics is the key to accurate AU detection. However, there is little work to date taking both of them into consideration concurrently. To capture the AU correlations in an image, we first disentangle the global (image) feature into multiple AU-specific features with an AU contrastive loss, and then we compute the feature for each AU by aggregating the features from the other AUs with a self-attention based transformer. Different from the original transformer, we embed the AU semantic dependency matrix into it to weakly guide the attention learning. We then weighted fuse the AU-wise features to obtain the frame-wise features. We further capture the temporal dependencies among frames by using another attention-based transformer, which achieves information aggregation from the prior frames. Extensive experiments on two benchmark datasets (i.e., BP4D and DISFA) demonstrate that the proposed framework outperforms the state-of-the-art approaches.

Full Text