Abstract

Facial expression recognition (FER) becomes more challenging in the wild due to unconstrained conditions, such as the different illumination, pose changes, and occlusion of the face. Current FER methods deploy the attention mechanism in deep neural networks to improve the performance. However, these models only capture the limited attention features and relationships. Thus this paper proposes a novel FER framework called multi-relations aware network (MRAN), which can focus on global and local attention features and learn the multi-level relationships among local regions, between global-local features and among different samples, to obtain efficient emotional features. Specifically, our method first imposes the spatial attention on both the whole face and local regions to simultaneously learn the global and local salient features. After that, a region relation transformer is deployed to capture the internal structure among local facial regions, and a global-local relation transformer is designed to learn the fusion relations between global features and local features for different facial expressions. Subsequently, a sample relation transformer is deployed to focus on intrinsic similarity relationship among training samples, which promotes invariant feature learning for each expression. Finally, a joint optimization strategy is designed to efficiently optimize the model. The conducted experimental results on in-the-wild databases show that our method obtains the superior performance compared to some state-of-the-art models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call