Deep Affect Prediction in-the-Wild: Aff-Wild Database and Challenge, Deep Architectures, and Beyond

Athanasios Papaioannou,Guoying Zhao,Stefanos Zafeiriou,Irene Kotsia,Panagiotis Tzirakis,Mihalis A Nicolaou,Björn Schuller,Dimitrios Kollias

doi:10.1007/s11263-019-01158-4

Abstract

Automatic understanding of human affect using visual signals is of great importance in everyday human–machine interactions. Appraising human emotional states, behaviors and reactions displayed in real-world settings, can be accomplished using latent continuous dimensions (e.g., the circumplex model of affect). Valence (i.e., how positive or negative is an emotion) and arousal (i.e., power of the activation of the emotion) constitute popular and effective representations for affect. Nevertheless, the majority of collected datasets this far, although containing naturalistic emotional states, have been captured in highly controlled recording conditions. In this paper, we introduce the Aff-Wild benchmark for training and evaluating affect recognition algorithms. We also report on the results of the First Affect-in-the-wild Challenge (Aff-Wild Challenge) that was recently organized in conjunction with CVPR 2017 on the Aff-Wild database, and was the first ever challenge on the estimation of valence and arousal in-the-wild. Furthermore, we design and extensively train an end-to-end deep neural architecture which performs prediction of continuous emotion dimensions based on visual cues. The proposed deep learning architecture, AffWildNet, includes convolutional and recurrent neural network layers, exploiting the invariant properties of convolutional features, while also modeling temporal dynamics that arise in human behavior via the recurrent layers. The AffWildNet produced state-of-the-art results on the Aff-Wild Challenge. We then exploit the AffWild database for learning features, which can be used as priors for achieving best performances both for dimensional, as well as categorical emotion recognition, using the RECOLA, AFEW-VA and EmotiW 2017 datasets, compared to all other methods designed for the same goal. The database and emotion recognition models are available at http://ibug.doc.ic.ac.uk/resources/first-affect-wild-challenge.

Highlights

Current research in automatic analysis of facial affect aims at developing systems, such as robots and virtual humans, that will interact with humans in a naturalistic way under real-world settings
We fine-tune the AffWildNet on the REmote COLlaborative and Affective (RECOLA) and for comparison purposes we train on RECOLA an architecture comprised of a residual neural network (ResNet)-50 and a 2-layer Gated Recurrent Unit (GRU) stacked on top
It is clear that the performance on both arousal and valence of the fine-tuned model on the Aff-Wild database is much higher than the performance of the ResNet-GRU model

Summary

Introduction

Current research in automatic analysis of facial affect aims at developing systems, such as robots and virtual humans, that will interact with humans in a naturalistic way under real-world settings. Some representative datasets, which are still used in many recent works (Jung et al 2015), are the Cohn–Kanade database (Tian et al 2001; Lucey et al 2010), MMI database (Pantic et al 2005; Valstar and Pantic 2010), Multi-PIE database (Gross et al 2010) and the BU-3D and BU-4D databases (Yin et al 2006, 2008) It is accepted by the community that the facial expressions of naturalistic behaviors can be radically different from the posed ones (Corneanu et al 2016; Sariyanidi et al 2015; Zeng et al 2009). All the above databases have been captured in wellcontrolled recording conditions and mainly under a strictly defined scenario eliciting pain

Methods

Results

Conclusion