Deep Temporal–Spatial Aggregation for Video-Based Facial Expression Recognition

Xianzhang Pan,Xiaoying Guo,Wenshu Li,Jinzhao Wu,Junjie Xu,Wenping Guo

doi:10.3390/sym11010052

Abstract

The proposed method has 30 streams, i.e., 15 spatial streams and 15 temporal streams. Each spatial stream corresponds to each temporal stream. Therefore, this work correlates with the symmetry concept. It is a difficult task to classify video-based facial expression owing to the gap between the visual descriptors and the emotions. In order to bridge the gap, a new video descriptor for facial expression recognition is presented to aggregate spatial and temporal convolutional features across the entire extent of a video. The designed framework integrates a state-of-the-art 30 stream and has a trainable spatial–temporal feature aggregation layer. This framework is end-to-end trainable for video-based facial expression recognition. Thus, this framework can effectively avoid overfitting to the limited emotional video datasets, and the trainable strategy can learn to better represent an entire video. The different schemas for pooling spatial–temporal features are investigated, and the spatial and temporal streams are best aggregated by utilizing the proposed method. The extensive experiments on two public databases, BAUM-1s and eNTERFACE05, show that this framework has promising performance and outperforms the state-of-the-art strategies.

Highlights

Facial expressions are non-verbal information and can complement our verbal information.Video-based facial expression recognition (VFER) aims to automatically classify human expression categories in video
We test the different places in our framework where the EmotionalVlan layer can be inserted
Our framework can generate more comprehensive and effective features for VFER than the state-of-the-art method owing to the EmotionalVlan layer, which is a trainable aggregation layer and pools the temporal features and spatial features separately extracted by fc7

Summary

Introduction

Facial expressions are non-verbal information and can complement our verbal information. Video-based facial expression recognition (VFER) aims to automatically classify human expression categories in video. A large number of researchers have become interested in VFER in the past decades. VFER is a challenging task because there is a large gap between visual features and emotions [1]. It has potential applications in healthcare, robotics, and driver safety [2,3,4,5,6]. The work of reference [7] defined the six facial expressions of anger, disgust, fear, happiness, sadness, and surprise in 1993

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Symmetry	Publication Date: Jan 5, 2019
Citations: 15	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Deep Temporal–Spatial Aggregation for Video-Based Facial Expression Recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Symmetry

Lead the way for us

Similar Papers

WITHDRAWN: Facial expression recognition from videos using CNN and feature aggregation
Ratnalata Gupta ... L.K Vishwamitra
Materials Today: Proceedings | VOL. -
Ratnalata Gupta, et. al.Ratnalata Gupta ... L.K Vishwamitra
01 Mar 2021
Materials Today: Proceedings | VOL. -

Improved two-stream model for human action recognition
Yuxuan Zhao ... Kamran Siddique
EURASIP Journal on Image and Video Processing | VOL. 2020
Yuxuan Zhao, et. al.Yuxuan Zhao ... Kamran Siddique
17 Jun 2020
EURASIP Journal on Image and Video Processing | VOL. 2020

Spatial-temporal interaction learning based two-stream network for action recognition
Tianyu Liu ... Ping Jiang
Information Sciences | VOL. 606
Tianyu Liu, et. al.Tianyu Liu ... Ping Jiang
28 May 2022
Information Sciences | VOL. 606

Multi-head attention-based two-stream EfficientNet for action recognition
Aihua Zhou ... Yujun Ma
Multimedia Systems | VOL. 29
Aihua Zhou, et. al.Aihua Zhou ... Yujun Ma
24 Jun 2022
Multimedia Systems | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deep Temporal–Spatial Aggregation for Video-Based Facial Expression Recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Symmetry