Abstract

Automated speech emotion recognition (SER) has been gaining popularity among researchers for three decades because of its vast number of applications in the real world. It is helpful to improve the relationship between humans and machines, online marketing and education, customer relations, medical treatment, safe driving, online search, etc. Researchers adopt various methods to improve emotion recognition performance from speech signals, like using different combinations of features (acoustic, non-acoustic, or both), classifiers (machine learning, deep learning, or both). Our study tried to improve emotion classification performance using features based on the multi-resolution variational mode decomposition (MRVMD) method. We first decompose each signal frame into several sub-signals known as modes or intrinsic mode functions (IMFs) using the MRVMD method. Then the proposed features, multi-resolution variational mode mel-frequency cepstral coefficient (MRVMMFCC), multi-resolution variational mode approximate entropy (MRVMAE), and multi-resolution variational mode permutation entropy (MRVMPE), were extracted using the MRVMD-decomposed IMF signals. Finally, different combinations of the proposed features are used to classify the emotion using a deep neural network (DNN) classifier. From the experimental results, we found that combination of the proposed feature (MRVMMFCC + MRVMAE + MRVMPE) performs better than the other combination in recognizing emotion using speech signals. The proposed feature combination with a DNN classifier achieved an emotion classification accuracy of 83.4%, 85.01%, and 90.51% for the SAVEE, EMOVO, and EMO-DB datasets, respectively. We found that the proposed MRVMD method performed better than the state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call