Abstract

As an important part of our daily life, speech has a great impact on the way people communicate. The Mel filter bank used in the extraction process of MFCC has a better ability to process the low-frequency component of a speech signal, but it weakens the emotional information contained in the high-frequency part of the speech signal. We used the inverted Mel filter bank to enhance the feature processing of the high-frequency part of the speech signal to obtain the IMFCC coefficients and fuse the MFCC features in order to obtain I_MFCC. Finally, to more accurately characterize emotional traits, we combined the Teager energy operator coefficients (TEOC) and the I_MFCC to obtain TEOC&I_MFCC and input it into the CNN_LSTM neural network. Experimental results on RAVDESS show that the feature fusion using Teager energy operator coefficients and I_MFCC has a higher emotion recognition accuracy, and the system achieves 92.99% weighted accuracy (WA) and 92.88% unweighted accuracy (UA).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call