Improving multilingual speech emotion recognition by combining acoustic features in a three-layer model

Xingfeng Li,Masato Akagi

doi:10.1016/j.specom.2019.04.004

Abstract

This study presents a scheme for multilingual speech emotion recognition. Determining the emotion of speech in general relies upon specific training data, and a different target speaker or language may present significant challenges. In this regard, we first explore 215 acoustic features from emotional speech. Second, we carry out speaker normalization and feature selection to develop a shared standard acoustic parameter set for multiple languages. Third, we use a three-layer model composed of acoustic features, semantic primitives, and emotion dimensions to map acoustics into emotion dimensions. Finally, we classify the continuous emotion dimensional values into basic categories by using the logistic model trees. The proposed approach was tested on Japanese, German, Chinese, and English emotional speech corpora. The recognition performance was examined and enhanced by cross-speaker and cross-corpus evaluation, and stressed the fact that our strategy is particularly suited for the task of multilingual emotion recognition even with a different speaker or language. The experimental results were found to be reasonably comparable with those of monolingual emotion recognizers as a reference.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improving multilingual speech emotion recognition by combining acoustic features in a three-layer model

Abstract

Talk to us

Similar Papers

More From: Speech Communication

Lead the way for us

Journal: Speech Communication	Publication Date: Apr 3, 2019
Citations: 38

Similar Papers

Emotional speech synthesis system based on a three-layered model using a dimensional approach
Yawen Xue ... Masato Akagi
-
Yawen Xue, et. al.Yawen Xue ... Masato Akagi
01 Dec 2015
01 Dec 2015

Cross-lingual speech emotion recognition system based on a three-layer model for human perception
Reda Elbarougy ... Masato Akagi
-
Reda Elbarougy, et. al.Reda Elbarougy ... Masato Akagi
01 Oct 2013
01 Oct 2013

Emotion Recognition Combining Acoustic and Linguistic Features Based on Speech Recognition Results
Misaki Sakurai ... Tetsuo Kosaka
-
Misaki Sakurai, et. al.Misaki Sakurai ... Tetsuo Kosaka
12 Oct 2021
12 Oct 2021

Speech Emotion and Naturalness Recognitions With Multitask and Single-Task Learnings
Bagus Tris Atmaja ... Akira Sasou
IEEE Access | VOL. 10
Bagus Tris Atmaja, et. al.Bagus Tris Atmaja ... Akira Sasou
01 Jan 2021
IEEE Access | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improving multilingual speech emotion recognition by combining acoustic features in a three-layer model

Abstract

Talk to us

Similar Papers

More From: Speech Communication