Simple Meta-optimization of the Feature MFCC for Public Emotional Datasets Classification

Jose Ramón Villar,Alberto Gallucci,Enrique De La Cal,Mario Koeppen,Kaori Yoshida

doi:10.1007/978-3-030-86271-8_55

Abstract

A Speech Emotion Recognition (SER) system can be defined as a collection of methodologies that process and classify speech signals to detect emotions embedded in them [2]. Among the most critical issues to consider in an SER system are: i) definition of the kind of emotions to classify, ii) look for suitable datasets, iii) selection of the proper input features and iv) optimisation of the convenient features. This work will consider four of the well-known dataset in the literature: EmoDB, TESS, SAVEE and RAVDSS. Thus, this study focuses on designing a low-power SER algorithm based on combining one prosodic feature with six spectral features to capture the rhythm and frequency. The proposal compares eleven low-power Classical classification Machine Learning techniques (CML), where the main novelty is optimising the two main parameters of the MFCC spectral feature through the meta-heuristic technique SA: the n_mfcc and the hop_length.

Full Text