VIT and Bi-LSTM for Micro-Expressions Recognition

Huilong Chen,Jianjiang Cui,Yonglin Zhang,Yucheng Zhang

doi:10.1109/iciscae55891.2022.9927522

Abstract

Recently, micro-expression has attracted much attention with its various real-world applications, which is spontaneous and usually hide real emotions of people. Considering that hand-crafted and deep learned features is still facing challenges for recognizing micro-expressions due to the subtle changes of micro-expressions, a novel network based on Vision Transformer (VIT) and Bidirectional Long Short Term Memory Neural Network (Bi-LSTM), which incorporates optical flow and deep learning algorithm and could capture the spatial-temporal deformations of micro-expression sequence, is proposed. First, optical flow sequence and RGB sequence are extracted from the micro-expression sequence are combined to feature map as input data, and the spatial features of the micro-expression feature map are obtained by encoding each micro-expression into a feature vector with VIT, and then the Bi-LSTM is employed to transfer these feature vectors to temporal features of micro-expressions. Finally, a classificatory layer convert these distinctive features to different micro-expression categories. To evaluate the effectiveness of this method, we conducts experiments on the micro-expression database CASME II and compares VIT with several classical CNN networks. The results show that its recognition accuracy and F1 score are 86.67% and 0.864, respectively, which can distinguish micro-expression categories more accurately than existing methods. Moreover, compared with several classical CNN networks, VIT shows excellent performance for facial spatial feature extraction.

Full Text