Deep learning based multimodal emotion recognition using model-level fusion of audio–visual modalities

Asif Iqbal Middya,Baibhav Nag,Sarbani Roy

doi:10.1016/j.knosys.2022.108580

Abstract

Emotion identification based on multimodal data (e.g., audio, video, text, etc.) is one of the most demanding and important research fields, with various uses. In this context, this research work has conducted a rigorous exploration of model-level fusion to find out the optimal multimodal model for emotion recognition using audio and video modalities. More specifically, separate novel feature extractor networks for audio and video data are proposed. After that, an optimal multimodal emotion recognition model is created by fusing audio and video features at the model level. The performances of the proposed models are assessed based on two benchmark multimodal datasets namely Ryerson Audio–Visual Database of Emotional Speech and Song (RAVDESS) and Surrey Audio–Visual Expressed Emotion (SAVEE) using various performance metrics. The proposed models achieve high predictive accuracies of 99% and 86% on the SAVEE and RAVDESS datasets, respectively. The effectiveness of the models are also verified by comparing their performances with the existing emotion recognition models. Some case studies are also conducted to explore the model’s ability to capture the variability of emotional states of the speakers in publicly available real-world audio–visual media.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Deep learning based multimodal emotion recognition using model-level fusion of audio–visual modalities

Abstract

Talk to us

Similar Papers

More From: Knowledge-Based Systems

Lead the way for us

Journal: Knowledge-Based Systems	Publication Date: Mar 16, 2022
Citations: 92

Similar Papers

A Novel Classification Method with Cubic Spline Interpolation
Husam Ali Abdulmohsin ... Abdul Mohssen Jaber Abdul Hossen
Intelligent Automation & Soft Computing | VOL. 31
Husam Ali Abdulmohsin, et. al.Husam Ali Abdulmohsin ... Abdul Mohssen Jaber Abdul Hossen
01 Jan 2021
Intelligent Automation & Soft Computing | VOL. 31

Recognition of Emotions in Speech Using Convolutional Neural Networks on Different Datasets
Marta Zielonka ... Artur Piastowski
Electronics | VOL. 11
Marta Zielonka, et. al.Marta Zielonka ... Artur Piastowski
21 Nov 2022
Electronics | VOL. 11

A novel spatio-temporal convolutional neural framework for multimodal emotion recognition
Masoumeh Sharafi ... Fahimeh Nasimi
Biomedical Signal Processing and Control | VOL. 78
Masoumeh Sharafi, et. al.Masoumeh Sharafi ... Fahimeh Nasimi
27 Jul 2022
Biomedical Signal Processing and Control | VOL. 78

A Feature Selection Algorithm Based on Differential Evolution for English Speech Emotion Recognition
Liya Yue ... Pei Hu
Applied Sciences | VOL. 13
Liya Yue, et. al.Liya Yue ... Pei Hu
16 Nov 2023
Applied Sciences | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deep learning based multimodal emotion recognition using model-level fusion of audio–visual modalities

Abstract

Talk to us

Similar Papers

More From: Knowledge-Based Systems