Context-Aware Emotion Recognition in the Wild Using Spatio-Temporal and Temporal-Pyramid Models

Nhu-Tai Do,Soo-Hyung Kim,Guee-Sang Lee,Hyung-Jeong Yang,Soonja Yeom

doi:10.3390/s21072344

Nhu-Tai Do, Soo-Hyung Kim + Show 3 more

Open Access

PDF Available

https://doi.org/10.3390/s21072344

Copy DOI

Export

Save

Cite

Journal: Sensors	Publication Date: Mar 27, 2021
Citations: 4	License type: CC BY 4.0

Affiliation: Chonnam National University, University of Tasmania

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Emotion recognition plays an important role in human–computer interactions. Recent studies have focused on video emotion recognition in the wild and have run into difficulties related to occlusion, illumination, complex behavior over time, and auditory cues. State-of-the-art methods use multiple modalities, such as frame-level, spatiotemporal, and audio approaches. However, such methods have difficulties in exploiting long-term dependencies in temporal information, capturing contextual information, and integrating multi-modal information. In this paper, we introduce a multi-modal flexible system for video-based emotion recognition in the wild. Our system tracks and votes on significant faces corresponding to persons of interest in a video to classify seven basic emotions. The key contribution of this study is that it proposes the use of face feature extraction with context-aware and statistical information for emotion recognition. We also build two model architectures to effectively exploit long-term dependencies in temporal information with a temporal-pyramid model and a spatiotemporal model with “Conv2D+LSTM+3DCNN+Classify” architecture. Finally, we propose the best selection ensemble to improve the accuracy of multi-modal fusion. The best selection ensemble selects the best combination from spatiotemporal and temporal-pyramid models to achieve the best accuracy for classifying the seven basic emotions. In our experiment, we take benchmark measurement on the AFEW dataset with high accuracy.

Highlights

IntroductionEmotional cues provide universal signals that enable human beings to communicate during the course of daily activities and are a significant component of social interactions
Emotional cues provide universal signals that enable human beings to communicate during the course of daily activities and are a significant component of social interactions.For example, people will use facial expressions such as a big smile to signal their happiness to others when they feel joyful
We propose an overall system with face tracking and voting to select the main face for emotion recognition using two models based on spatiotemporal and temporal-pyramid architecture to efficiently improve emotion recognition

Summary

Introduction

Emotional cues provide universal signals that enable human beings to communicate during the course of daily activities and are a significant component of social interactions. People will use facial expressions such as a big smile to signal their happiness to others when they feel joyful. People receive emotional cues (facial expressions, body gestures, tone of voice, etc.) from their social partners and combine them with their experiences to perceive emotions and make suitable decisions. In an attempt to develop methods based on new technologies in the computer vision and pattern recognition fields. This type of research has a wide range of applications, such as advertising, health monitoring, smart video surveillance, and development of intelligent robotic interfaces [1]

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Context-Aware Emotion Recognition in the Wild Using Spatio-Temporal and Temporal-Pyramid Models

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Sensors

Lead the way for us

Similar Papers

Spanish MEACorpus 2023: A multimodal speech–text corpus for emotion analysis in Spanish from natural environments
Ronghao Pan ... Rafel Valencia-García
Computer Standards & Interfaces | VOL. 90
Ronghao Pan, et. al.Ronghao Pan ... Rafel Valencia-García
02 Apr 2024
Computer Standards & Interfaces | VOL. 90

Cataloging of Happy Facial Affect Using a Radial Basis Function Neural Network
M Nachamai ... Pranti Dutta
-
M Nachamai, et. al.M Nachamai ... Pranti Dutta
01 Jan 2013
01 Jan 2013

Emotion Recognition from Mizo Speech: A Signal Processing Approach
Aditya Raj Mangalam ... Sudeep Singh
-
Aditya Raj Mangalam, et. al.Aditya Raj Mangalam ... Sudeep Singh
23 Apr 2022
23 Apr 2022

A Deep Feature based Multi-kernel Learning Approach for Video Emotion Recognition
Wei Li ... Zhigang Zhu
-
Wei Li, et. al.Wei Li ... Zhigang Zhu
09 Nov 2015
09 Nov 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Context-Aware Emotion Recognition in the Wild Using Spatio-Temporal and Temporal-Pyramid Models

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Sensors