Abstract

The following paper introduces a group of novel speech-signal descriptors that reflect phoneme-pronunciation variability and that can be considered as potentially useful features for emotion sensing. The proposed group includes a set of statistical parameters of Poincare maps, derived for formant-frequency evolution and energy evolution of voiced-speech segments. Two groups of Poincare-map characteristics were considered in the research: descriptors of sample-scatter, which reflect magnitudes of phone-uttering variations and descriptors of cross-correlations that exist among samples and that evaluate consistency of variations. It has been shown that inclusion of the proposed characteristics into the pool of commonly used speech descriptors, results in a noticeable increase—at the level of 10%—in emotion sensing performance. Standard pattern recognition methodology has been adopted for evaluation of the proposed descriptors, with the assumption that three- or four-dimensional feature spaces can provide sufficient emotion sensing. Binary decision trees have been selected for data classification, as they provide with detailed information on emotion-specific discriminative power of various speech descriptors.

Highlights

  • Emotion sensing has become an increasingly important research direction in speech analysis

  • Together with the proposed twenty descriptors of voiced-speech variability, the pool of features considered for emotion recognition comprised a total of 146 elements

  • To enable cross-language comparisons of emotion sensing performance, we focused on classification of six emotions that are common to both databases

Read more

Summary

Introduction

Emotion sensing has become an increasingly important research direction in speech analysis. Different than these involved in production of pitch, energy and temporal components of the speech signal, one can hypothesize that the proposed descriptors could introduce useful, novel information for emotion classification. To verify this hypothesis, the following feature evaluation methodology is adopted: a broad pool of features that are commonly used in emotion sensing is supplemented with the proposed descriptors and classification-performance driven feature-selection is executed on the produced set. Experimental evaluation of the proposed approach has been made for a six category emotion classification problem (joy, anger, boredom, sadness, fear and neutral) and involved databases of emotional speech of two different languages: German [14] and Polish [15].

Vowel Pronunciation Variability Assessment Using Poincare Maps
Descriptor Evaluation Methodology
Derivation of Feature Spaces
Emotion Classification
Experimental Evaluation of the Proposed Approach
Selection of the Feature-Subset Acceptance Threshold
Descriptor Performance Evaluation
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call