Abstract

In this article, we propose a novel model for facial micro-expression (FME) recognition. The proposed model basically comprises a transformer, which is recently used for computer vision and has never been used for FME recognition. A transformer requires a huge amount of data compared to a convolution neural network. Then, we use motion features, such as optical flow and late fusion to complement the lack of FME dataset. The proposed method was verified and evaluated using the SMIC and CASME II datasets. Our approach achieved state-of-the-art (SOTA) performance of 0.7447 and 73.17% in SMIC in terms of unweighted F1 score (UF1) and accuracy (Acc.), respectively, which are 0.31 and 1.8% higher than previous SOTA. Furthermore, UF1 of 0.7106 and Acc. of 70.68% were shown in the CASME II experiment, which are comparable with SOTA.

Highlights

  • Facial micro-expression (FME) often faintly occurs for 0.04–0.2 s when people try hiding their true feelings, unlike macro-expression appearing on the face from 0.75–2 s.Due to these characteristics of FME, it is cost-intensive to build an FME dataset and there are few FME datasets

  • We can find that our method achieves the best accuracy and unweighted F1 score (UF1) on average in SMIC and shows comparable performance in CASME II

  • We examine whether the motion feature yields significant improvements and analyze the effect of color information, which is considered useless in FME recognition because it is subject-dependent

Read more

Summary

Introduction

Facial micro-expression (FME) often faintly occurs for 0.04–0.2 s when people try hiding their true feelings, unlike macro-expression appearing on the face from 0.75–2 s.Due to these characteristics of FME, it is cost-intensive to build an FME dataset and there are few FME datasets. Facial micro-expression (FME) often faintly occurs for 0.04–0.2 s when people try hiding their true feelings, unlike macro-expression appearing on the face from 0.75–2 s. II [1,2], developed in a strict environment, have a small number of samples Because of this nature of FME, most early studies [1,3,4,5,6] used handcrafted features such as local binary patterns on three orthogonal planes and optical flow [7]. A recent study [13], which injected CNN-like inductive biases [14], locality and pyramid structure, into transformer models, showed similar performance to CNN with scratch training on the ImageNet dataset

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call