Abstract

This paper compares and analyzes the training effect of Convolutional Recurrent Neural Network (CRNN) and Convolutional Neural Network (CNN) in speech emotion recognition. In order to solve the problem that CNN lacks the extraction of temporal information and the general temporal model is insufficient to represent the spatial information, CRNN is applied to speech emotion recognition. Taking Mel Frequency Cepstrum Coefficient (MFCC) and Gammatone Frequency Cepstrum Coefficient (GFCC) as the input features of the model, the recognition performances of CRNN and CNN in speech emotion recognition are compared and analyzed. The research shows that CRNN has higher accuracy for both features, which effectively improves the computing power of speech emotion model and provides a theoretical basis and optimization direction for improving the accuracy of speech emotion recognition.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call