Learning Expression Features via Deep Residual Attention Networks for Facial Expression Recognition From Video Sequences

Gang Chen,Shiqing Zhang,Xiaoming Zhao,Xin Tao,Yuelong Chuang

doi:10.1080/02564602.2020.1814168

Abstract

Facial expression recognition from video sequences is currently an interesting research topic in computer vision, pattern recognition, artificial intelligence, etc. Considering the problem of semantic gap between the extracted hand-designed features in affective videos and subjective emotions, recognizing facial expressions from video sequences is a challenging subject. To tackle this problem, this paper proposes a new method of facial expression recognition from video sequences via deep residual attention network. Firstly, due to the difference in the intensity of emotional representation of each local area in a facial image, deep residual attention networks are employed to learn high-level affective expression features for each frame of facial expression images in video sequences. The used deep residual attention networks integrate deep residual networks with a spatial attention mechanism. Then, average-pooling is performed to produce fixed-length global video-level feature representations. Finally, the global video-level feature representations are utilized as inputs of a multi-layer perceptron to conduct facial expression classification tasks in video sequences. Experimental results on two public video emotional datasets, i.e. BAUM-1s and RML, demonstrate the effectiveness of the proposed method.

Full Text