Abstract

Automatic speech emotion recognition (SER) is a challenging component of human-computer interaction (HCI). Existing literatures mainly focus on evaluating the SER performance by means of training and testing on a single corpus with a single language setting. However, in many practical applications, there are great differences between the training corpus and testing corpus. Due to the diversity of different speech emotional corpus or languages, most previous SER methods do not perform well when applied in real-world cross-corpus or cross-language scenarios. Inspired by the powerful feature learning ability of recently-emerged deep learning techniques, various advanced deep learning models have increasingly been adopted for cross-corpus SER. This paper aims to provide an up-to-date and comprehensive survey of cross-corpus SER, especially for various deep learning techniques associated with supervised, unsupervised and semi-supervised learning in this area. In addition, this paper also highlights different challenges and opportunities on cross-corpus SER tasks, and points out its future trends.

Highlights

  • Emotion recognition is an important direction in psychology, biology, and computer science, and has recently received extensive attention from the engineering research field

  • Based on the extracted INTERSPEECH-2010 Paralinguistic Challenge feature set with 1,582 level descriptors (LLDs), a new method of transfer non-negative matrix factorization (TNMF) (Song et al, 2016b), in which the nonnegative matrix factorization (NMF) and the maximum mean discrepancy (MMD) algorithms were combined, was developed for cross-corpus speech emotion recognition (SER)

  • In Latif et al (2018b), considering the fact that deep belief networks (DBNs) have a strong generalization power, this study presented a transfer learning technique based on DBNs to improve the performance of SER in cross-language and Methods for cross-corpus Datasets

Read more

Summary

Introduction

Emotion recognition is an important direction in psychology, biology, and computer science, and has recently received extensive attention from the engineering research field. One of the starting points for emotion recognition is to assist in designing more humane human-computer interaction (HCI) methods, since emotion plays a key role in the fields of HCI, artificial intelligence (Cowie et al, 2001; Ramakrishnan and El Emary, 2013; Feng and Chaspari, 2020). Traditional HCI is mainly carried out through keyboard, mouse, screen, etc. It only pursues convenience and accuracy, and cannot understand and adapt to people’s emotions or mood. If the computer lacks the ability to understand and express emotions, it is difficult to expect the computer to have the same intelligence as human beings. The purpose of affective computing (Picard, 2010) is to endow computers the ability to observe, understand, and generate various emotional features similar to humans, and enable computers to interact naturally, cordially, and vividly like humans

Objectives
Methods
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call