Abstract
The performance of automated facial expression coding is improving steadily. Advances in deep learning techniques have been key to this success. While the advantage of modern deep learning techniques is clear, the contribution of critical design choices remains largely unknown, especially for facial action unit occurrence and intensity across pose. Using the The Facial Expression Recognition and Analysis 2017 (FERA 2017) database, which provides a common protocol to evaluate robustness to pose variation, we systematically evaluated design choices in pre-training, feature alignment, model size selection, and optimizer details. Informed by the findings, we developed an architecture that exceeds state-of-the-art on FERA 2017. The architecture achieved a 3.5% increase in F1 score for occurrence detection and a 5.8% increase in Intraclass Correlation (ICC) for intensity estimation. To evaluate the generalizability of the architecture to unseen poses and new dataset domains, we performed experiments across pose in FERA 2017 and across domains in Denver Intensity of Spontaneous Facial Action (DISFA) and the UNBC Pain Archive.
Highlights
Emotion recognition technologies play an important role in human computer interaction systems
We achieved state-of-the-art performance in both the occurrence detection and the intensity estimation sub-challenges of FERA 2017 (Valstar et al, 2017) and state-of-the art in cross-domain generalizability to the Denver Intensity of Spontaneous Facial Action (DISFA) dataset (Mavadati et al, 2013)
The aim of this study is to investigate the key parameters for both action units (AUs) occurrence detection and intensity estimation for this task and discover the optimal configuration
Summary
Emotion recognition technologies play an important role in human computer interaction systems. Facial action units (AUs) (Ekman et al, 2002) have been widely used, which correspond to discrete muscle contractions. In the last-half decade, automated facial affect recognition (AFAR) systems have made major advances in detection of the occurrence and intensity of facial actions. Because frontal face views occur commonly in less constrained settings, robustness to pose variation is essential. The Facial Expression Recognition and Analysis 2017 (FERA 2017) challenge provided the first common protocol to evaluate robustness to pose variation (Valstar et al, 2017). In FERA 2017, deep learning (DL)-based approaches achieved the best performance in sub-challenges (Tang et al, 2017) for occurrence detection (Zhou et al, 2017) and intensity estimation
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have