Abstract

A Speech Emotion Recognition (SER) system can be defined as a collection of methodologies that process and classify speech signals to detect emotions embedded in them [2]. Among the most critical issues to consider in an SER system are: i) definition of the kind of emotions to classify, ii) look for suitable datasets, iii) selection of the proper input features and iv) optimisation of the convenient features. This work will consider four of the well-known dataset in the literature: EmoDB, TESS, SAVEE and RAVDSS. Thus, this study focuses on designing a low-power SER algorithm based on combining one prosodic feature with six spectral features to capture the rhythm and frequency. The proposal compares eleven low-power Classical classification Machine Learning techniques (CML), where the main novelty is optimising the two main parameters of the MFCC spectral feature through the meta-heuristic technique SA: the n_mfcc and the hop_length.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call