Abstract
Facial expression recognition (FER) is a challenging problem in the fields of pattern recognition and computer vision. The recent success of convolutional neural networks (CNNs) in object detection and object segmentation tasks has shown promise in building an automatic deep CNN-based FER model. However, in real-world scenarios, performance degrades dramatically owing to the great diversity of factors unrelated to facial expressions, and due to a lack of training data and an intrinsic imbalance in the existing facial emotion datasets. To tackle these problems, this paper not only applies deep transfer learning techniques, but also proposes a novel loss function called weighted-cluster loss, which is used during the fine-tuning phase. Specifically, the weighted-cluster loss function simultaneously improves the intra-class compactness and the inter-class separability by learning a class center for each emotion class. It also takes the imbalance in a facial expression dataset into account by giving each emotion class a weight based on its proportion of the total number of images. In addition, a recent, successful deep CNN architecture, pre-trained in the task of face identification with the VGGFace2 database from the Visual Geometry Group at Oxford University, is employed and fine-tuned using the proposed loss function to recognize eight basic facial emotions from the AffectNet database of facial expression, valence, and arousal computing in the wild. Experiments on an AffectNet real-world facial dataset demonstrate that our method outperforms the baseline CNN models that use either weighted-softmax loss or center loss.
Highlights
Facial expressions are undoubtedly a dominant, natural, and effective channel used by people to convey their emotions and intentions during communication
We constructed deep convolutional neural network (CNN)-based facial expression recognition (FER) models to recognize eight common facial expressions from the AffectNet database of facial expression, valence, and arousal computing in the wild [5]
By analyzing the complementary nature of weighted loss and center loss, we propose a new loss function, named weighted-cluster loss, which takes highly skewed facial emotion data into consideration, but it uses multiple loss terms to improve the performance of the FER models
Summary
Facial expressions are undoubtedly a dominant, natural, and effective channel used by people to convey their emotions and intentions during communication. The experimental results with the AffectNet dataset [5] show that our transfer learning-based model with weighted-cluster loss outperforms other models that use either weighted softmax-loss or center loss. To address the overfitting problem, a deep transfer-based framework is proposed in which we utilize an SE-ResNet-50 model [21] (which was pre-trained on VGGFace data [22]) for fine-tuning on AffectNet dataset. To alleviate the imbalanced-data problem, we propose a new loss function, named weighted-cluster loss, which gives weights to each emotion class’s loss terms based on their relative proportion of the total number of samples in the training dataset.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.