Abstract

Understanding a person’s feelings is a very important process for the affective computing. People express their emotions in various ways. Among them, facial expression is the most effective way to present human emotional status. We propose efficient deep joint spatiotemporal features for facial expression recognition based on the deep appearance and geometric neural networks. We apply three-dimensional (3D) convolution to extract spatial and temporal features at the same time. For the geometric network, 23 dominant facial landmarks are selected to express the movement of facial muscle through the analysis of energy distribution of whole facial landmarks.We combine these features by the designed joint fusion classifier to complement each other. From the experimental results, we verify the recognition accuracy of 99.21%, 87.88%, and 91.83% for CK+, MMI, and FERA datasets, respectively. Through the comparative analysis, we show that the proposed scheme is able to improve the recognition accuracy by 4% at least.

Highlights

  • Affective computing is the study of recognizing human feelings [1]

  • We compare with other state-of-the-art algorithms in facial expression recognition using spatiotemporal networks

  • We have proposed an efficient deep joint spatiotemporal network (DJSTN) that combined the appearance network and geometric network with joint fusion classifier

Read more

Summary

Introduction

Affective computing is the study of recognizing human feelings [1]. Using this technology, we can understand people more effectively. Facial expression recognition is a technique that automatically extracts the features on human face to recognize the patterns of expression It identifies the state of emotions by classifying 6–8 major emotions such as angry, disgust, fear, happy, sad, and surprise. Data augmentation is required to increase the amount of data In this way, the accuracy of facial expression recognition is increasing to near perfection. The accuracy of facial expression recognition is increasing to near perfection There is another problem, namely dependency on dataset [7,8]. Datasets must be rich enough to accept all these variants for facial expression recognition To overcome these problems, researchers make cross-dataset that use multiple datasets for learning [9].

Classical Approaches
Deep Learning-Based Approaches
Datasets and Data Augmentation
Feature Extraction
Appearance Feature-Based Network Structure
Landmark Detection
Geometric Feature-Based Network Structure
Joint Fusion Classifier
Experimental Results and Discussion
Performance as the Number of Input Frames
Performance as the Number of Landmarks
Optimal Weight Analysis for Joint Fusion Classifier
Performance of the Accuracy
Method
Methods
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.