Micro Expression Recognition via Dual-Stream Spatiotemporal Attention Network.

Yan Wang,Yikun Huang,Can Liu,Dandan Yang,Xiaoying Gu,Bo Zhang,Shuopeng Wang

doi:10.1155/2021/7799100

Abstract

Microexpression can manifest the real mood of humans, which has been widely concerned in clinical diagnosis and depression analysis. To solve the problem of missing discriminative spatiotemporal features in a small data set caused by the short duration and subtle movement changes of microexpression, we present a dual-stream spatiotemporal attention network (DSTAN) that integrates dual-stream spatiotemporal network and attention mechanism to capture the deformation features and spatiotemporal features of microexpression in the case of small samples. The Spatiotemporal networks in DSTAN are based on two lightweight networks, namely, the spatiotemporal appearance network (STAN) learning the appearance features from the microexpression sequences and the spatiotemporal motion network (STMN) learning the motion features from optical flow sequences. To focus on the discriminative motion areas of microexpression, we construct a novel attention mechanism for the spatial model of STAN and STMN, including a multiscale kernel spatial attention mechanism and global dual-pool channel attention mechanism. To obtain the importance of each frame in the microexpression sequence, we design a temporal attention mechanism for the temporal model of STAN and STMN to form spatiotemporal appearance network-attention (STAN-A) and spatiotemporal motion network-attention (STMN-A), which can adaptively perform dynamic feature refinement. Finally, the feature concatenate-SVM method is used to integrate STAN-A and STMN-A to a novel network, DSTAN. The extensive experiments on three small spontaneous microexpression data sets of SMIC, CASME, and CASME II demonstrate the proposed DSTAN can effectively cope with the recognition of microexpressions.

Highlights

Microexpression is a kind of spontaneous facial expression that can reveal the real emotion that people try to hide. e duration of microexpression is short, only lasting 1/25 s∼1/ 5 s [1]
spatiotemporal appearance network (STAN)-A extracts spatiotemporal appearance features from the original microexpression sequence, and spatiotemporal motion network (STMN)-A extracts the spatiotemporal motion features from the optical flow sequence to describe the subtle motion changes of the microexpression
En, the spatial features are input into the temporal model of STAN and STMN to get the spatiotemporal features of the microexpression

Summary

Introduction

Microexpression is a kind of spontaneous facial expression that can reveal the real emotion that people try to hide. e duration of microexpression is short, only lasting 1/25 s∼1/ 5 s [1]. A large number of automatic recognition methods have emerged, which greatly improve the application feasibility of microexpression. Microexpression recognition has a wide application prospect in the police interrogation, clinical diagnosis, depression analysis, and other fields [2,3,4,5]. In the microexpression recognition procedures, feature extraction is the critical step and researchers strive to seek the reprehensive methods. LBP-TOP (local binary pattern with three orthogonal planes) [6] is a typical texture featurebased method for microexpression recognition and taken as the baseline of handcraft methods. Due to its shortcomings of sensitivity and sparse sampling, there are many improved methods, such as LBP-SIP (local binary pattern with six intersection points) [7], STLBP-IP (spatial-temporal local binary pattern with integral projection) [8], STCLQP

Objectives

Methods

Results

Conclusion