Recognizing Emotions in Video Using Multimodal DNN Feature Fusion

Jennifer Williams,Ramona Comanescu,Steven Kleinegesse,Oana Radu

doi:10.18653/v1/w18-3302

Abstract

We present our system description of input-level multimodal fusion of audio, video, and text for recognition of emotions and their intensities for the 2018 First Grand Challenge on Computational Modeling of Human Multimodal Language. Our proposed approach is based on input-level feature fusion with sequence learning from Bidirectional Long-Short Term Memory (BLSTM) deep neural networks (DNNs). We show that our fusion approach outperforms unimodal predictors. Our system performs 6-way simultaneous classification and regression, allowing for overlapping emotion labels in a video segment. This leads to an overall binary accuracy of 90%, overall 4-class accuracy of 89.2% and an overall mean-absolute-error (MAE) of 0.12. Our work shows that an early fusion technique can effectively predict the presence of multi-label emotions as well as their coarse-grained intensities. The presented multimodal approach creates a simple and robust baseline on this new Grand Challenge dataset. Furthermore, we provide a detailed analysis of emotion intensity distributions as output from our DNN, as well as a related discussion concerning the inherent difficulty of this task.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Recognizing Emotions in Video Using Multimodal DNN Feature Fusion

Abstract

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2018
Citations: 78	License type: cc-by

Similar Papers

Deep fusion of multi-modal features for brain tumor image segmentation
Guying Zhang ... Hancan Zhu
Heliyon | VOL. 9
Guying Zhang, et. al.Guying Zhang ... Hancan Zhu
01 Aug 2023
Heliyon | VOL. 9

Air quality index prediction using multivariate deep neural networks: A case study of a proposed state capital in India
Venkata Siva Raja Prasad Sunku ... Rambabu Mukkamala
Journal of Air Pollution and Health | VOL. -
Venkata Siva Raja Prasad Sunku, et. al.Venkata Siva Raja Prasad Sunku ... Rambabu Mukkamala
08 Oct 2023
Journal of Air Pollution and Health | VOL. -

Adaptive Multimodal Fusion With Attention Guided Deep Supervision Net for Grading Hepatocellular Carcinoma.
Shangxuan Li ... Guangyi Wang
IEEE Journal of Biomedical and Health Informatics | VOL. 26
Shangxuan Li, et. al.Shangxuan Li ... Guangyi Wang
01 Aug 2022
IEEE Journal of Biomedical and Health Informatics | VOL. 26

A Variational Autoencoder-Based Dimensionality Reduction Technique for Generation Forecasting in Cyber-Physical Smart Grids
Devinder Kaur ... Md Apel Mahmud
-
Devinder Kaur, et. al.Devinder Kaur ... Md Apel Mahmud
01 Jun 2021
01 Jun 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Recognizing Emotions in Video Using Multimodal DNN Feature Fusion

Abstract

Talk to us

Similar Papers