Abstract

This paper is an extension of our previously published ACM Multimedia 2022 paper, which was ranked 3rd in the macro-expressions (MaEs) and micro-expressions (MEs) spotting task of the FME challenge 2021. In our earlier work, a deep learning framework based on facial action units (AUs) was proposed to emphasize both local and global features to deal with the MaEs and MEs spotting tasks. In this paper, an advanced Concat-CNN model is proposed to not only utilize facial action units (AU) features, which our previous work proved were more effective in detecting MaEs, but also to fuse the optical flow features to improve the detection performance of MEs. The advanced Concat-CNN proposed in this paper not only considers the intra-features correlation of a single frame but also the inter-features correlation between frames. Further, we devise a new adaptive re-labeling method by labeling the emotional frames with distinctive scores. This method takes into account the dynamic changes in expressions to further improve the overall detection performance. Compared with our earlier work and several existing works, the newly proposed deep learning pipeline is able to achieve a better performance in terms of the overall F1-scores: 0.2623 on CAS(ME)2, 0.2839 on CAS(ME)2-cropped, and 0.3241 on SAMM-LV, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call