Emotion Recognition from Videos Using Multimodal Large Language Models

Lorenzo Vaiani,Luca Cagliero,Paolo Garza

doi:10.3390/fi16070247

Abstract

The diffusion of Multimodal Large Language Models (MLLMs) has opened new research directions in the context of video content understanding and classification. Emotion recognition from videos aims to automatically detect human emotions such as anxiety and fear. It requires deeply elaborating multiple data modalities, including acoustic and visual streams. State-of-the-art approaches leverage transformer-based architectures to combine multimodal sources. However, the impressive performance of MLLMs in content retrieval and generation offers new opportunities to extend the capabilities of existing emotion recognizers. This paper explores the performance of MLLMs in the emotion recognition task in a zero-shot learning setting. Furthermore, it presents a state-of-the-art architecture extension based on MLLM content reformulation. The performance achieved on the Hume-Reaction benchmark shows that MLLMs are still unable to outperform the state-of-the-art average performance but, notably, are more effective than traditional transformers in recognizing emotions with an intensity that deviates from the average of the samples.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Emotion Recognition from Videos Using Multimodal Large Language Models

Abstract

Talk to us

Similar Papers

More From: Future Internet

Lead the way for us

Journal: Future Internet	Publication Date: Jul 13, 2024
License type: CC BY 4.0

Similar Papers

Responses during Facial Emotional Expression Recognition Tasks Using Virtual Reality and Static IAPS Pictures for Adults with Schizophrenia
Esubalew Bekele ... Dayi Bian
-
Esubalew Bekele, et. al.Esubalew Bekele ... Dayi Bian
01 Jan 2014
01 Jan 2014

Emotion recognition/understanding ability in hearing or vision-impaired children: do sounds, sights, or words make the difference?
Murray J Dyck ... Charles Farrugia
Journal of Child Psychology and Psychiatry | VOL. 45
Murray J Dyck, et. al.Murray J Dyck ... Charles Farrugia
30 Mar 2004
Journal of Child Psychology and Psychiatry | VOL. 45

Thinking versus feeling: How interoception and cognition influence emotion recognition in behavioural-variant frontotemporal dementia, Alzheimer's disease, and Parkinson's disease
Jessica L Hazelton ... Fiona Kumfor
Cortex; a journal devoted to the study of the nervous system and behavior | VOL. 163
Jessica L Hazelton, et. al.Jessica L Hazelton ... Fiona Kumfor
23 Mar 2023
Cortex; a journal devoted to the study of the nervous system and behavior | VOL. 163

Pathological Narcissism and Metacognition of Emotional Face Recognition
Kim Bosun ... 이종환
Korean Journal of Clinical Psychology | VOL. 35
Kim Bosun, et. al. Kim Bosun ... 이종환
01 Aug 2016
Korean Journal of Clinical Psychology | VOL. 35

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Emotion Recognition from Videos Using Multimodal Large Language Models

Abstract

Talk to us

Similar Papers

More From: Future Internet