Time-Delay Neural Network for Continuous Emotional Dimension Prediction From Facial Expression Sequences.

Hongying Meng,Nadia Bianchi-Berthouze,Yangdong Deng,John P Cosmas,Jinkuang Cheng

doi:10.1109/tcyb.2015.2418092

Hongying Meng, Nadia Bianchi-Berthouze + Show 3 more

Open Access

https://doi.org/10.1109/tcyb.2015.2418092

Copy DOI

Abstract

Automatic continuous affective state prediction from naturalistic facial expression is a very challenging research topic but very important in human-computer interaction. One of the main challenges is modeling the dynamics that characterize naturalistic expressions. In this paper, a novel two-stage automatic system is proposed to continuously predict affective dimension values from facial expression videos. In the first stage, traditional regression methods are used to classify each individual video frame, while in the second stage, a time-delay neural network (TDNN) is proposed to model the temporal relationships between consecutive predictions. The two-stage approach separates the emotional state dynamics modeling from an individual emotional state prediction step based on input features. In doing so, the temporal information used by the TDNN is not biased by the high variability between features of consecutive frames and allows the network to more easily exploit the slow changing dynamics between emotional states. The system was fully tested and evaluated on three different facial expression video datasets. Our experimental results demonstrate that the use of a two-stage approach combined with the TDNN to take into account previously classified frames significantly improves the overall performance of continuous emotional state estimation in naturalistic facial expressions. The proposed approach has won the affect recognition sub-challenge of the Third International Audio/Visual Emotion Recognition Challenge.

Highlights

E MOTIONAL expressions are very important in human communication
The work presented in this paper aims to contribute to this research area by proposing a novel framework for automatic emotional state prediction from facial expressions in a continuous space
The results show that the combination of support vector regression (SVR)+time-delay neural network (TDNN) outperforms SVR alone

Summary

Introduction

E MOTIONAL expressions are very important in human communication. They mediate interaction between people, enrich and often clarify the meaning of words or sentencesManuscript received June 13, 2014; revised January 12, 2015 and March 20, 2015; accepted March 21, 2015.

Objectives

Results

Conclusion