Abstract

Emotions are an integral part of human interactions and are significant factors in determining user satisfaction or customer opinion. speech emotion recognition (SER) modules also play an important role in the development of human–computer interaction (HCI) applications. A tremendous number of SER systems have been developed over the last decades. Attention-based deep neural networks (DNNs) have been shown as suitable tools for mining information that is unevenly time distributed in multimedia content. The attention mechanism has been recently incorporated in DNN architectures to emphasise also emotional salient information. This paper provides a review of the recent development in SER and also examines the impact of various attention mechanisms on SER performance. Overall comparison of the system accuracies is performed on a widely used IEMOCAP benchmark database.

Highlights

  • The aim of human–computer interaction (HCI) is to create a more effective and natural communication interface between people and computers, but its focus lies on creating the aesthetic design, pleasant user experience, help in human development, online learning improvement, etc

  • The paper is divided into two main parts: the first part discusses a general concept of speech emotion recognition (SER) and related works, including a description of the novel and deep features, and transfer learning and generalisation techniques, and the focus of the second part is on deep neural networks (DNNs) models incorporating attention mechanism

  • If the class accuracies are weighted according to the number of per-class instances, the evaluation metric may not reflect the unbalanced nature of data

Read more

Summary

Introduction

The aim of human–computer interaction (HCI) is to create a more effective and natural communication interface between people and computers, but its focus lies on creating the aesthetic design, pleasant user experience, help in human development, online learning improvement, etc. The basic concept consists of the analysis of input text, voice, and visual clues in order to assess the subject’s psychiatric disorder and inform about diagnosis and treatment. Another suggestion for emotion recognition application is a conversational chatbot, where. Electronics 2021, 10, 1163 clues in order to assess the subject’s psychiatric disorder and inform about diagnosi2s oafn2d9 treatment. Another suggestion for emotion recognition application is a conversational chatbot, where speech emotion identification can play a role in better conversation [4]. The section is concluded by a short discussion on the impact of AM on SER system performance

Scope and Methodology
Evaluation Metrics
Speech Emotion Recognition and Deep Learning
Databases of Emotional Speech
Acoustic Features
Data-Driven Features
Temporal Variations Modelling
Transfer Learning
Generalisation Techniques
Data Augmentation
Cross-Domain Recognition
DNN Systems Comparison
CNN channels
LSTM–RNN
Attention Mechanism in Speech Emotion Recognition
Attentive Deep Recurrent Neural Networks
Attentive Deep Convolutional Neural Network
Attentive Convolutional–Recurrent Deep Neural Network
Impact of Attention Mechanism on SER
32 LLDs BiLSTM–ELM
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call