Abstract

Driven by advancements in deep learning technologies, substantial progress has been achieved in the field of facial expression recognition over the past decade, while challenges remain brought about by occlusions, pose variations and subtle expression differences in unconstrained (wild) scenarios. Therefore, a novel multiscale feature extraction method is proposed in this paper, that leverages convolutional neural networks to simultaneously extract deep semantic features and shallow geometric features. Through the mechanism of channel-wise self-attention, prominent features are further extracted and compressed, preserving advantageous features for distinction and thereby reducing the impact of occlusions and pose variations on expression recognition. Meanwhile, inspired by the large cosine margin concept used in face recognition, a center cosine loss function is proposed to avoid the misclassification caused by the underlying interclass similarity and substantial intra-class feature variations in the task of expression recognition. This function is designed to enhance the classification performance of the network through making the distribution of samples within the same class more compact and that between different classes sparser. The proposed method is benchmarked against several advanced baseline models on three mainstream wild datasets and two datasets that present realistic occlusion and pose variation challenges. Accuracies of 89.63%, 61.82%, and 91.15% are achieved on RAF-DB, AffectNet and FERPlus, respectively, demonstrating the greater robustness and reliability of this method compared to the state-of-the-art alternatives in the real world.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call