Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning.

Cristina Luna-Jiménez,David Griol,Zoraida Callejas,Ricardo Kleinlein,Juan M Montero,Fernando Fernández-Martínez

doi:10.3390/s21227665

Cristina Luna-Jiménez, David Griol + Show 4 more

Open Access

https://doi.org/10.3390/s21227665

Copy DOI

Abstract

Emotion Recognition is attracting the attention of the research community due to the multiple areas where it can be applied, such as in healthcare or in road safety systems. In this paper, we propose a multimodal emotion recognition system that relies on speech and facial information. For the speech-based modality, we evaluated several transfer-learning techniques, more specifically, embedding extraction and Fine-Tuning. The best accuracy results were achieved when we fine-tuned the CNN-14 of the PANNs framework, confirming that the training was more robust when it did not start from scratch and the tasks were similar. Regarding the facial emotion recognizers, we propose a framework that consists of a pre-trained Spatial Transformer Network on saliency maps and facial images followed by a bi-LSTM with an attention mechanism. The error analysis reported that the frame-based systems could present some problems when they were used directly to solve a video-based task despite the domain adaptation, which opens a new line of research to discover new ways to correct this mismatch and take advantage of the embedded knowledge of these pre-trained models. Finally, from the combination of these two modalities with a late fusion strategy, we achieved 80.08% accuracy on the RAVDESS dataset on a subject-wise 5-CV evaluation, classifying eight emotions. The results revealed that these modalities carry relevant information to detect users’ emotional state and their combination enables improvement of system performance.

Highlights

Emotions are present in almost every decision and moment of our lives
This difference may be explained by the dimension of the embeddings, where AlexNet embeddings have a size of 4096, the embeddings extracted from Convolutional Neural Network (CNN)-14 have a dimension of 2048, half of the size
It outperformed AlexNet results by 15.86% in the same conditions, without using Voice Activity Detector (VAD). One cause of this difference could be the nature of the training data, since AlexNet had pre-trained weights learned from images of ImageNet, whereas CNN-14 was trained using Mel spectrograms extracted from audios

Summary

Introduction

Emotions are present in almost every decision and moment of our lives. recognizing emotions awakens interest, since knowing what others feel lets us interact with them more effectively. By analyzing individuals’ behavior, it is possible to detect a loss of trust or changes in emotions This capability lets that specific system, such as Conversational Systems and Embodied Conversational Agents (ECAs) [1,2], react to these events and adapt their actions to improve interactions or modify the dialogue contents, tone, or facial expressions to create a better socio-affective user experience [3]. There are systems able to recognize certain emotions (or deficits) that can help with the diagnosis of specific diseases (e.g., depressive disorders [4,5], Parkinson’s [6], etc.) and improve patients’ treatments Another relevant application of facial expression recognition is for automotive safety. Recognizing negative emotions such as stress, anger, or fatigue is crucial to avoid traffic accidents and increase the security on the road [7] on intelligent vehicles, allowing them to respond to the driver’s state

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Sensors (Basel, Switzerland)	Publication Date: Nov 18, 2021
Citations: 65	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)

Lead the way for us

Similar Papers

A Proposal for Multimodal Emotion Recognition Using Aural Transformers and Action Units on RAVDESS Dataset
Cristina Luna-Jiménez ... David Griol
Applied Sciences | VOL. 12
Cristina Luna-Jiménez, et. al.Cristina Luna-Jiménez ... David Griol
30 Dec 2021
Applied Sciences | VOL. 12

A fine-tuning deep residual convolutional neural network for emotion recognition based on frequency-channel matrices representation of one-dimensional electroencephalography
Jichi Chen ... Enqiu He
Computer Methods in Biomechanics and Biomedical Engineering | VOL. ahead-of-print
Jichi Chen, et. al.Jichi Chen ... Enqiu He
24 Nov 2023
Computer Methods in Biomechanics and Biomedical Engineering | VOL. ahead-of-print

Adults' emotional states and recognition of emotion in young children
Charles R Carlson ... Frank P Gantz
Motivation and Emotion | VOL. 7
Charles R Carlson, et. al.Charles R Carlson ... Frank P Gantz
01 Mar 1983
Motivation and Emotion | VOL. 7

Harmony Search 알고리즘 기반 HMM 구조 최적화에 의한 얼굴 정서 인식 시스템 개발
Kwang-Eun Ko ... Kwee-Bo Sim
Journal of Korean Institute of Intelligent Systems | VOL. 21
Kwang-Eun Ko, et. al.Kwang-Eun Ko ... Kwee-Bo Sim
25 Jun 2011
Journal of Korean Institute of Intelligent Systems | VOL. 21

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)