Multi-Input Speech Emotion Recognition Model Using Mel Spectrogram and GeMAPS.

Itsuki Toyoshima,Yoshifumi Okada,Ryunosuke Uchiyama,Mayu Tada,Momoko Ishimaru

doi:10.3390/s23031743

Abstract

The existing research on emotion recognition commonly uses mel spectrogram (MelSpec) and Geneva minimalistic acoustic parameter set (GeMAPS) as acoustic parameters to learn the audio features. MelSpec can represent the time-series variations of each frequency but cannot manage multiple types of audio features. On the other hand, GeMAPS can handle multiple audio features but fails to provide information on their time-series variations. Thus, this study proposes a speech emotion recognition model based on a multi-input deep neural network that simultaneously learns these two audio features. The proposed model comprises three parts, specifically, for learning MelSpec in image format, learning GeMAPS in vector format, and integrating them to predict the emotion. Additionally, a focal loss function is introduced to address the imbalanced data problem among the emotion classes. The results of the recognition experiments demonstrate weighted and unweighted accuracies of 0.6657 and 0.6149, respectively, which are higher than or comparable to those of the existing state-of-the-art methods. Overall, the proposed model significantly improves the recognition accuracy of the emotion "happiness", which has been difficult to identify in previous studies owing to limited data. Therefore, the proposed model can effectively recognize emotions from speech and can be applied for practical purposes with future development.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Sensors	Publication Date: Feb 3, 2023
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Multi-Input Speech Emotion Recognition Model Using Mel Spectrogram and GeMAPS.

Abstract

Talk to us

Similar Papers

More From: Sensors

Lead the way for us

Similar Papers

Emotion recognition using facial and audio features
Tarun Krishna ... Shubham Gupta
-
Tarun Krishna, et. al.Tarun Krishna ... Shubham Gupta
09 Dec 2013
09 Dec 2013

Multimodal Emotion Recognition Based on Deep Temporal Features Using Cross-Modal Transformer and Self-Attention
Bubai Maji ... Aurobinda Routray
-
Bubai Maji, et. al.Bubai Maji ... Aurobinda Routray
04 Jun 2023
04 Jun 2023

Investigating Multi-feature Selection and Ensembling for Audio Classification
Muhammad Turab ... Teerath Kumar
International Journal of Artificial Intelligence & Applications | VOL. 13
Muhammad Turab, et. al.Muhammad Turab ... Teerath Kumar
31 May 2022
International Journal of Artificial Intelligence & Applications | VOL. 13

Exploring relationships between audio features and emotion in music
Toiviaine Petri
Frontiers in Human Neuroscience | VOL. 3
Toiviaine PetriToiviaine Petri
01 Jan 2009
Frontiers in Human Neuroscience | VOL. 3

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-Input Speech Emotion Recognition Model Using Mel Spectrogram and GeMAPS.

Abstract

Talk to us

Similar Papers

More From: Sensors