Attention and Feature Selection for Automatic Speech Emotion Recognition Using Utterance and Syllable-Level Prosodic Features

Starlet Ben Alex,Leena Mary,Ben P Babu

doi:10.1007/s00034-020-01429-3

Abstract

This work attempts to recognize emotions from human speech using prosodic information represented by variations in duration, energy, and fundamental frequency ( $$F_{0}$$ ) values. For this, the speech signal is first automatically segmented into syllables. Prosodic features at the utterance (15 features) and syllable level (10 features) are extracted using the syllable boundaries and trained separately using deep neural network classifiers. The effectiveness of the proposed approach is demonstrated on German speech corpus-EMOTional Sensitivity ASistance System (EmotAsS) for people with disabilities, the dataset used for the Interspeech 2018 Atypical Affect Sub-Challenge. The initial set of prosodic features on evaluation yields an unweighted average recall (UAR) of 30.15%. A fusion of the decision scores of these features with spectral features gives a UAR of 36.71%. This paper also employs methods like attention mechanism and feature selection using resampling-based recursive feature elimination (RFE) to enhance system performance. Implementing attention and feature selection followed by a score-level fusion improves the UAR to 36.83% and 40.96% for prosodic features and overall fusion, respectively. The fusion of the scores of the best individual system of the Atypical Affect Sub-Challenge and the proposed system provides a UAR (43.71%) above the best test result reported. The effectiveness of the proposed system has also been demonstrated on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) database with a UAR of 63.83%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Attention and Feature Selection for Automatic Speech Emotion Recognition Using Utterance and Syllable-Level Prosodic Features

Abstract

Talk to us

Similar Papers

More From: Circuits, Systems, and Signal Processing

Lead the way for us

Journal: Circuits, Systems, and Signal Processing	Publication Date: May 14, 2020
Citations: 24

Similar Papers

Hierarchical Component-attention Based Speaker Turn Embedding for Emotion Recognition
Shuo Liu ... Bjorn Schuller
-
Shuo Liu, et. al.Shuo Liu ... Bjorn Schuller
01 Jul 2020
01 Jul 2020

Attention-LSTM-Attention Model for Speech Emotion Recognition and Analysis of IEMOCAP Database
Yeonguk Yu ... Yoon-Joong Kim
Electronics | VOL. 9
Yeonguk Yu, et. al.Yeonguk Yu ... Yoon-Joong Kim
26 Apr 2020
Electronics | VOL. 9

Affective Latent Representation of Acoustic and Lexical Features for Emotion Recognition.
Eesung Kim ... Hyungchan Song
Sensors | VOL. 20
Eesung Kim, et. al.Eesung Kim ... Hyungchan Song
04 May 2020
Sensors | VOL. 20

Utterance and Syllable Level Prosodic Features for Automatic Emotion Recognition
Starlet Ben Alex ... Leena Mary
-
Starlet Ben Alex, et. al.Starlet Ben Alex ... Leena Mary
01 Dec 2018
01 Dec 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Attention and Feature Selection for Automatic Speech Emotion Recognition Using Utterance and Syllable-Level Prosodic Features

Abstract

Talk to us

Similar Papers

More From: Circuits, Systems, and Signal Processing