An Empirical Experiment on Feature Extractions Based for Speech Emotion Recognition

Binh Van Duong,Trung T Nguyen,Phuc Nguyen,Trong-Hop Do,Chien Nhu Ha

doi:10.1007/978-3-031-21967-2_15

Binh Van Duong, Trung T Nguyen + Show 3 more

https://doi.org/10.1007/978-3-031-21967-2_15

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

AbstractIn recent years, the virtual assistant has become an essential part of many applications on smart devices. In these applications, users talk to virtual assistants in order to give commands. This makes speech emotion recognition to be a serious problem in improving the service and the quality of virtual assistants. However, speech emotion recognition is not a straightforward task as emotion can be expressed through various features. Having a deep understanding of these features is crucial to achieving a good result in speech emotion recognition. To this end, this paper conducts empirical experiments on three kinds of speech features: Mel-spectrogram, Mel-frequency cepstral coefficients, Tempogram, and their variants for the task of speech emotion recognition. Convolutional Neural Networks, Long Short-Term Memory, Multi-layer Perceptron Classifier, and Light Gradient Boosting Machine are used to build classification models used for the emotion classification task based on the three speech features. Two popular datasets: The Ryerson Audio-Visual Database of Emotional Speech and Song, and The Crowd-Sourced Emotional Multimodal Actors Dataset are used to train these models.KeywordsMFCCsMel-spectrogramTempogramSpeech emotion

Full Text