Multimodal Spatio-Temporal Framework for Real-World Affect Recognition

Karishma Raut,Sujata Kulkarni,Ashwini Sawant

doi:10.1016/j.ijin.2024.10.001

Abstract

Deep learning models show great potential in applications involving video-based affect recognition, including human-computer interaction, robotic interfaces, stress and depression assessment, and Alzheimer's disease detection. The low complex Multimodal Diverse Spatio-Temporal Network (MDSTN) has been analysed to effectively capture spatio-temporal information from audio-visual modalities for affect recognition using the Acted Facial Expressions in the Wild (AFEW) dataset. The scarcity of data is handled by data augmented parallel feature extraction for visual network. Visual features extracted by carefully reviewing and customizing Convolutional 3D architecture over different ranges are combined to train a neural network for classification. Multi-resolution Cochleagram (MRCG) features from speech, along with spectral and prosodic audio features, are processed by a supervised classifier. The late fusion technique is explored to integrate audio and video modalities, considering their processing over different temporal spans. The MDSTN approach significantly boosts the accuracy of basic emotion recognition to 71.54% on the AFEW dataset. It demonstrates exceptional proficiency in identifying emotions such as disgust and surprise, thus exceeding current benchmarks in real-world affect recognition.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multimodal Spatio-Temporal Framework for Real-World Affect Recognition

Abstract

Talk to us

Similar Papers

More From: International Journal of Intelligent Networks

Lead the way for us

Journal: International Journal of Intelligent Networks	Publication Date: Oct 1, 2024
License type: cc-by-nc-nd

Similar Papers

Wearable-Based Affect Recognition-A Review.
Philip Schmidt ... Kristof Van Laerhoven
Sensors | VOL. 19
Philip Schmidt, et. al.Philip Schmidt ... Kristof Van Laerhoven
20 Sep 2019
Sensors | VOL. 19

Robust Person Re-Identification Through the Combination of Metric Learning and Late Fusion Techniques
Hong-Quan Nguyen ... Thi-Lan Le
Vietnam Journal of Computer Science | VOL. 08
Hong-Quan Nguyen, et. al.Hong-Quan Nguyen ... Thi-Lan Le
19 Jan 2021
Vietnam Journal of Computer Science | VOL. 08

Deep Learning and Late Fusion Technique in Medical X-ray Image
Alebiosu David Olayemi ... Anuja Dharmaratne
-
Alebiosu David Olayemi, et. al.Alebiosu David Olayemi ... Anuja Dharmaratne
13 Dec 2020
13 Dec 2020

MFCC and Prosodic Feature Extraction Techniques: A Comparative Study
Nilu Singh ... Raj Shree
International Journal of Computer Applications | VOL. 54
Nilu Singh, et. al.Nilu Singh ... Raj Shree
25 Sep 2012
International Journal of Computer Applications | VOL. 54

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multimodal Spatio-Temporal Framework for Real-World Affect Recognition

Abstract

Talk to us

Similar Papers

More From: International Journal of Intelligent Networks