Micro-expression recognition (MER) is a challenging computer vision problem, where the limited amount of available training data and insufficient intensity of the facial expressions are among the main issues adversely affecting the performance of existing recognition models. To address these challenges, this paper explores a transfer–learning enabled MER model using a densely connected feature extraction module with mixed attention. Unlike previous works that utilize transfer learning to facilitate MER and extract local facial-expression information, our model relies on pretraining with three diverse macro-expression datasets and, as a result, can: (i) overcome the problem of insufficient sample size and limited training data availability, (ii) leverage (related) domain-specific information from multiple datasets with diverse characteristics, and (iii) improve the model adaptability to complex scenes. Furthermore, to enhance the intensity of the micro-expressions and improve the discriminability of the extracted features, the Euler video magnification (EVM) method is adopted in the preprocessing stage and then used jointly with a densely connected feature extraction module and a mixed attention mechanism to derive expressive feature representations for the classification procedure. The proposed feature extraction mechanism not only guarantees the integrity of the extracted features but also efficiently captures local texture cues by aggregating the most salient information from the generated feature maps, which is key for the MER task. The experimental results on multiple datasets demonstrate the robustness and effectiveness of our model compared to the state-of-the-art.
Read full abstract