Facial Expression Recognition in Videos by learning Spatio-Temporal Features with Deep Neural Networks

Priyanka Anil Gavade,Vandana Bhat,Jagadeesh Pujari

doi:10.1109/iciip53038.2021.9702545

Abstract

Face expression recognition in videos is one of the most challenging research topics in the field of Computer vision. With the advancements in Deep Learning and promising results of Deep Neural Networks, a significant improvement in the performance of the emotion recognition system is observed. This paper first presents a fusion feature extraction approach that involves extracting and combining high-level temporal and spatial features from the video sequences. Second, the learned visual features are input to a Hybrid classifier, i.e., combination of Convolution Neural Network (CNN) and Long short-term memory (LSTM) recurrent neural network, to identify human expressions automatically. Later, hybrid Alex Net-LSTM, VGG-LSTM, Resnet-LSTM, and inception V2-LSTM classifiers are trained on RAVDESS, SAVEE and AFEW databases. The classification result of the proposed method has been compared with other models in which the same datasets for video emotion recognition were used. The proposed method obtains the recognition accuracy of 97.6%, 97.1%, and 95.0% for datasets, such as SAVEE, RAVDESS, and AFEW, respectively.

Full Text