Affective Latent Representation of Acoustic and Lexical Features for Emotion Recognition.

Eesung Kim,Hyungchan Song,Jong Won Shin

doi:10.3390/s20092614

Eesung Kim, Hyungchan Song + Show 1 more

Open Access

https://doi.org/10.3390/s20092614

Copy DOI

Journal: Sensors	Publication Date: May 4, 2020
Citations: 5	License type: CC BY 4.0

Affiliation: Gwangju Institute of Science and Technology

Abstract

In this paper, we propose a novel emotion recognition method based on the underlying emotional characteristics extracted from a conditional adversarial auto-encoder (CAAE), in which both acoustic and lexical features are used as inputs. The acoustic features are generated by calculating statistical functionals of low-level descriptors and by a deep neural network (DNN). These acoustic features are concatenated with three types of lexical features extracted from the text, which are a sparse representation, a distributed representation, and an affective lexicon-based dimensions. Two-dimensional latent representations similar to vectors in the valence-arousal space are obtained by a CAAE, which can be directly mapped into the emotional classes without the need for a sophisticated classifier. In contrast to the previous attempt to a CAAE using only acoustic features, the proposed approach could enhance the performance of the emotion recognition because combined acoustic and lexical features provide enough discriminant power. Experimental results on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) corpus showed that our method outperformed the previously reported best results on the same corpus, achieving 76.72% in the unweighted average recall.

Highlights

Emotions play an important role in successful communication among humans [1], and more attention is given to recognize, interpret, and process emotional information effectively [2,3,4]
The distribution of the learned latent vectors extracted by a conditional adversarial auto-encoder (CAAE) for the training and test sets with acoustic features are shown in Figure 4a,b, respectively
We can see that the discriminant power of the two-dimensional latent vectors extracted from only the acoustic features may not be strong enough to determine the emotional class of the given utterance

Summary

Introduction

Emotions play an important role in successful communication among humans [1], and more attention is given to recognize, interpret, and process emotional information effectively [2,3,4]. There have been many research works to recognize human emotion from the speech signal based on acoustic features [5,6,7,8,9,10,11,12,13,14,15,16,17,18], lexical features [19,20], or both of them [21,22,23,24,25,26,27,28,29,30,31,32]. Jin et al [22] used three types of acoustic features including LLD, Gaussian supervectors, and bag-of-audio-words. These acoustic features are combined with an e-vector, which adopts a salience information weighting scheme and BOW. Gamage et al [23] suggested another weighting

Methods

Findings

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Affective Latent Representation of Acoustic and Lexical Features for Emotion Recognition.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors

Lead the way for us

Similar Papers

DNN-based Emotion Recognition Based on Bottleneck Acoustic Features and Lexical Features
Eesung Kim ... Jong Won Shin
-
Eesung Kim, et. al.Eesung Kim ... Jong Won Shin
01 May 2019
01 May 2019

A Novel Approach for Classification of Speech Emotions Based on Deep and Acoustic Features
Mehmet Bilal Er
IEEE Access | VOL. 8
Mehmet Bilal ErMehmet Bilal Er
01 Jan 2020
IEEE Access | VOL. 8

Significance of Phonological Features in Speech Emotion Recognition
Wei Wang ... Lingjie Shen
International Journal of Speech Technology | VOL. 23
Wei Wang, et. al.Wei Wang ... Lingjie Shen
15 Jul 2020
International Journal of Speech Technology | VOL. 23

Recognition of emotions from video using acoustic and facial features
K Sreenivasa Rao ... Shashidhar G Koolagudi
Signal, Image and Video Processing | VOL. 9
K Sreenivasa Rao, et. al.K Sreenivasa Rao ... Shashidhar G Koolagudi
24 Jul 2013
Signal, Image and Video Processing | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Affective Latent Representation of Acoustic and Lexical Features for Emotion Recognition.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors