Abstract

In this paper, the Mexican Emotional Speech Database (MESD) that contains single-word emotional utterances for anger, disgust, fear, happiness, neutral and sadness with adult (male and female) and child voices is described. To validate the emotional prosody of the uttered words, a cubic Support Vector Machines classifier was trained on the basis of prosodic, spectral and voice quality features for each case study: (1) male adult, (2) female adult and (3) child. In addition, cultural, semantic, and linguistic shaping of emotional expression was assessed by statistical analysis. This study was registered at BioMed Central and is part of the implementation of a published study protocol. Mean emotional classification accuracies yielded 93.3%, 89.4% and 83.3% for male, female and child utterances respectively. Statistical analysis emphasized the shaping of emotional prosodies by semantic and linguistic features. A cultural variation in emotional expression was highlighted by comparing the MESD with the INTERFACE for Castilian Spanish database. The MESD provides reliable content for linguistic emotional prosody shaped by the Mexican cultural environment. In order to facilitate further investigations, a corpus controlled for linguistic features and emotional semantics, as well as one containing words repeated across voices and emotions are provided. The MESD is made freely available.

Highlights

  • Support Vector Machines (SVM) performance was increased by at most 32.3% if considering Mel frequency magnitude coefficients

  • We propose a cubic SVM model to assess the validity of the emotional content of (1) the 1152 single-word-utterance initial repertory from each type of voice, and (2) the 288-utterance dataset per type of voice integrated into the Mexican Emotional Speech Database (MESD)

  • The validation process led to competitive performances for speech emotion recognition when MESD and INTERFACE were compared

Read more

Summary

Introduction

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. Human–computer affective interactions have been extensively studied, with many applications developed in a wide variety of fields. A case in point is speech emotion recognition, which aims to design artificial intelligence mathematical models able to predict human emotional states from affective voice signal processing [1]. Such systems are nowadays useful in security, healthcare, videogaming and mobile communications, where the objective and rapid assessment of emotional states enriches the user environment [2]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call