Abstract

The popularity of emotion recognition using speech signals increases more and more because of its vast number of applications in the practical field. Emotion recognition using the speech signal is a very complicated and challenging task. This plays an essential role in enhancing human-computer interaction (HCI). Many authors used different methods to improve the accuracy of speech emotion recognition (SER). Proper selection of features and suitable machine and deep learning model design can improve the recognition rate. In this work, we used a modified version of the mel frequency cepstral coefficient (MFCC) feature named the mel frequency magnitude coefficient (MFMC) with convolutional neural network (CNN) and deep neural network (DNN) classifiers to enhance the SER. We used MFMC and MFCC features as input to CNN and DNN classifiers and evaluated the accuracy of SER. We made two observations from our experiment. First, the performance of the MFMC feature in SER is better than the MFCC feature for both classifiers. Second, the proposed DNN classifier achieved better accuracy than the CNN classifier for both features (MFMC and MFCC). The MFMC feature with the DNN classifier achieved an accuracy of 76.72%, 84.72%, 77.88%, and 100% for the RA VDESS, EMODB, SA VEE, and TESS datasets, respectively. Similarly, the CNN classifier with the MFMC feature achieved an accuracy of 72.9%, 82.41 %, 74.5%, and 100% for the same datasets. Our proposed work was compared with the state-of-the-art models, and we found that our model performed better than others.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call