Abstract

Facial emotion recognition (FER) has been an active research topic in the past several years. One of difficulties in FER is the effective capture of geometrical and temporary information from landmarks. In this paper, we propose a graph convolution neural network that utilizes landmark features for FER, which we called a directed graph neural network (DGNN). Nodes in the graph structure were defined by landmarks, and edges in the directed graph were built by the Delaunay method. By using graph neural networks, we could capture emotional information through faces’ inherent properties, like geometrical and temporary information. Also, in order to prevent the vanishing gradient problem, we further utilized a stable form of a temporal block in the graph framework. Our experimental results proved the effectiveness of the proposed method for datasets such as CK+ (96.02%), MMI (69.4%), and AFEW (32.64%). Also, a fusion network using image information as well as landmarks, is presented and investigated for the CK+ (98.47% performance) and AFEW (50.65% performance) datasets.

Highlights

  • Emotion recognition has been widely studied in various areas of computer vision as well as human–computer interactions (HCI)

  • We present a graph-based representation of facial landmarks via graph neural network (GNN) and propose a facial emotion recognition (FER) algorithm using the graph-based representation

  • Through a detailed analysis of GNN application, we found that GNN is very effective for sparse and arbitrary relational data, unlike convolutional neural network (CNN) or multilayer perceptron (MLP)

Read more

Summary

Introduction

Emotion recognition has been widely studied in various areas of computer vision as well as human–computer interactions (HCI). Various emotion recognition techniques which utilize different modalities such as video streams, audio signals, and bio-signals have been proposed. Kuo et al [1] extracted appearance and geometry features from image sequences and combined them via a joint fine-tuning. Zhang et al [3] combined temporal information from EEG signals and spatial information from facial images for human emotion recognition. Facial landmarks provide good information to analyze facial emotions. Yan et al [4] defined facial landmarks as derivatives of action units (AUs) for describing facial muscle movements. Other studies [5,6]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.