Excitation Features of Speech for Speaker-Specific Emotion Detection

Sudarsana Reddy Kadiri,Paavo Alku

doi:10.1109/access.2020.2982954

Abstract

In this article, we study emotion detection from speech in a speaker-specific scenario. By parameterizing the excitation component of voiced speech, the study explores deviations between emotional speech (e.g., speech produced in anger, happiness, sadness, etc.) and neutral speech (i.e., non-emotional) to develop an automatic emotion detection system. The excitation features used in this study are the instantaneous fundamental frequency, the strength of excitation and the energy of excitation. The Kullback-Leibler (KL) distance is computed to measure the similarity between feature distributions of emotional and neutral speech. Based on the KL distance value between a test utterance and an utterance produced in a neutral state by the same speaker, a detection decision is made by the system. In the training of the proposed system, only three neutral utterances produced by the speaker were used, unlike in most existing emotion recognition and detection systems that call for large amounts of training data (both emotional and neutral) by several speakers. In addition, the proposed system is independent of language or lexical content. The system is evaluated using two databases of emotional speech. The performance of the proposed detection method is shown to be better than that of reference methods.

Highlights

In addition to its linguistic contents, speech contains rich information about the speaker, such as the gender, age and emotional state
SUMMARY AND CONCLUSION In this study, we proposed an automatic emotion detection system from speech using excitation features extracted around glottal closure instants (GCIs)
Using the KL distance, the system measures the deviation between the reference utterance produced in a neutral state and a test utterance of emotional speech

Summary

INTRODUCTION

In addition to its linguistic contents, speech contains rich information about the speaker, such as the gender, age and emotional state. Alku: Excitation Features of Speech for Speaker-Specific Emotion Detection (e.g., a convolutional neural network (CNN) or a bidirectional long short-term memory (BLSTM) network) is trained to conduct the recognition task directly from the input (either from the raw signal waveform or from the spectrogram) [20], [21]. Both of these two approaches, are data driven and they call for lots of training data [16]–[21].

BASIS FOR THE PRESENT STUDY

EXTRACTION OF EXCITATION FEATURES

FEATURE EXTRACTION

FEATURE ANALYSIS

EMOTION DETECTION SYSTEM AND RESULTS

Findings

SUMMARY AND CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 24	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Excitation Features of Speech for Speaker-Specific Emotion Detection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Use of Emotional and Neutral Speech in Evaluating Compression Speeds.
Christopher Slugocki ... Neal Davis-Ruperto
Journal of the American Academy of Audiology | VOL. 32
Christopher Slugocki, et. al.Christopher Slugocki ... Neal Davis-Ruperto
01 Apr 2021
Journal of the American Academy of Audiology | VOL. 32

Excitation Features of Speech for Emotion Recognition Using Neutral Speech as Reference
Sudarsana Reddy Kadiri ... B Yegnanarayana
Circuits, Systems, and Signal Processing | VOL. 39
Sudarsana Reddy Kadiri, et. al.Sudarsana Reddy Kadiri ... B Yegnanarayana
25 Feb 2020
Circuits, Systems, and Signal Processing | VOL. 39

Analysis of Emotionally Salient Aspects of Fundamental Frequency for Emotion Detection
C Busso ... Sungbok Lee
IEEE Transactions on Audio, Speech, and Language Processing | VOL. 17
C Busso, et. al.C Busso ... Sungbok Lee
01 May 2009
IEEE Transactions on Audio, Speech, and Language Processing | VOL. 17

Iterative Feature Normalization Scheme for Automatic Emotion Detection from Speech
Carlos Busso ... Soroosh Mariooryad
IEEE Transactions on Affective Computing | VOL. 4
Carlos Busso, et. al.Carlos Busso ... Soroosh Mariooryad
01 Oct 2013
IEEE Transactions on Affective Computing | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Excitation Features of Speech for Speaker-Specific Emotion Detection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access