Facial expression recognition (FER) is widely applied in various real-world applications, such as security, human–computer interaction, and healthcare, leading to high demands for fast and accurate FER techniques. However, it remains a challenging task to design effective and efficient FER techniques that meet the requirements of real-world applications. In this paper, we propose a lightweight Gradual Self Distillation Network with adaptive channel attention (GSDNet) for accurate and efficient FER. Especially, we propose a novel gradual self distillation strategy which enables the network to learn from itself in a gradual and adaptive way. Specifically, the proposed GSDNet consists of a feature extraction backbone with multiple basic blocks. We plug an adaptive classifier after each basic block. Every two neighbor classifiers form “student & teacher” relationship for gradual knowledge distillation. In particular, the gradual self distillation strategy enables the transfer of key knowledge from deep to shallow layers gradually. Besides, an Adaptive Channel Attention Module (ACAM) is designed to enhance the representation capability of each block for adaptively capturing important features and achieve better FER performance. Extensive experiments on three real-world datasets show that the proposed method GSDNet outperforms the baselines, including state-of-the-art methods. Specifically, the accuracy of GSDNet on the RAF-DB, Affect-net, and FERPlus datasets is 90.91%, 66.11%, and 90.32%, separately. The code is available at https://github.com/Emy-cv/GSDNet.
Read full abstract