Abstract

The proposed method has 30 streams, i.e., 15 spatial streams and 15 temporal streams. Each spatial stream corresponds to each temporal stream. Therefore, this work correlates with the symmetry concept. It is a difficult task to classify video-based facial expression owing to the gap between the visual descriptors and the emotions. In order to bridge the gap, a new video descriptor for facial expression recognition is presented to aggregate spatial and temporal convolutional features across the entire extent of a video. The designed framework integrates a state-of-the-art 30 stream and has a trainable spatial–temporal feature aggregation layer. This framework is end-to-end trainable for video-based facial expression recognition. Thus, this framework can effectively avoid overfitting to the limited emotional video datasets, and the trainable strategy can learn to better represent an entire video. The different schemas for pooling spatial–temporal features are investigated, and the spatial and temporal streams are best aggregated by utilizing the proposed method. The extensive experiments on two public databases, BAUM-1s and eNTERFACE05, show that this framework has promising performance and outperforms the state-of-the-art strategies.

Highlights

  • Facial expressions are non-verbal information and can complement our verbal information.Video-based facial expression recognition (VFER) aims to automatically classify human expression categories in video

  • We test the different places in our framework where the EmotionalVlan layer can be inserted

  • Our framework can generate more comprehensive and effective features for VFER than the state-of-the-art method owing to the EmotionalVlan layer, which is a trainable aggregation layer and pools the temporal features and spatial features separately extracted by fc7

Read more

Summary

Introduction

Facial expressions are non-verbal information and can complement our verbal information. Video-based facial expression recognition (VFER) aims to automatically classify human expression categories in video. A large number of researchers have become interested in VFER in the past decades. VFER is a challenging task because there is a large gap between visual features and emotions [1]. It has potential applications in healthcare, robotics, and driver safety [2,3,4,5,6]. The work of reference [7] defined the six facial expressions of anger, disgust, fear, happiness, sadness, and surprise in 1993

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.