Abstract

With the development of deep learning, convolution neural network (CNN) has been widely applied in the field of emotion recognition. The vital to enhance the performance of singing emotion recognition system is to select a suitable feature and establish reliable models. The feature of Mel Frequency Cepstral Coefficient (MFCC) method has been proved to be effective in recognizing emotions. Therefore, in this paper, CNN is used to build a model of singing emotion recognition system, and MFCC method is used in feature extraction. For improving the accuracy of this system, the feature matrices have been segmented into small slices, and the method of majority vote has been used in the test part to identify the emotion. To verify the generalization of this system, this paper provides two approaches in model building part. One approach distinguishes male and female speakers separately. The other one is to build a mixed model. The accuracy of the singing emotion recognition system has been improved in both approaches and is not influenced by using separate model or mixed model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call