Abstract

Dimensional emotion recognition has attracted more and more researchers' attention from various fields including psychology, cognition, and computer science. In this paper, we propose an emotion-embedded visual attention model (EVAM) to learn emotion context information for predicting affective dimension values from video sequences. First, deep CNN is used to generate a high-level representation of the raw face images. Second, a visual attention model based on the gated recurrent unit (GRU) is employed to learn the context information of the feature sequences from face features. Third, the k-means algorithm is adapted to embed previous emotion into attention model to produce more robust time series predictions, which emphasize the influence of previous emotion on current effective prediction. In this paper, all experiments are carried out on database AVEC 2016 and AVEC 2017. The experimental results validate the efficiency of our method, and competitive results are obtained.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call