Abstract

In recent years, the analysis of macro- and micro-expression has drawn the attention of researchers. These expressions provide visual cues to an individual’s emotions, which can be used in a broad range of potential applications such as lie detection and policing. In this paper, we address the challenge of spotting facial macro- and micro-expression from videos and present compelling results by using a deep learning approach to analyze the optical flow features. Unlike other deep learning approaches that are mainly based on Convolutional Neural Networks (CNNs), we propose a Transformer-based deep learning approach that predicts a score indicating the probability of a frame being within an expression interval. In contrast to other Transformer-based models that achieve high performance by being pre-trained on large datasets, our deep learning model, called SL-Swin, which incorporates Shifted Patch Tokenization and Locality Self-Attention into the backbone Swin Transformer network, effectively spots macro- and micro-expressions by being trained from scratch on small-size expression datasets. Our evaluation outcomes surpass the MEGC 2022 spotting baseline result, obtaining an overall F1-score of 0.1366. Additionally, our approach performs well on the MEGC 2021 spotting task, with an overall F1-score of 0.1824 and 0.1357 on the CAS(ME)2 and SAMM Long Videos, respectively. The code is publicly available on GitHub.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call