Abstract

Aiming at the problems of music emotion classification, a music emotion recognition method based on the convolutional neural network is proposed. First, the mel-frequency cepstral coefficient (MFCC) and residual phase (RP) are weighted and combined to extract the audio low-level features of music, so as to improve the efficiency of data mining. Then, the spectrogram is input into the convolutional recurrent neural network (CRNN) to extract the time-domain features, frequency-domain features, and sequence features of audio. At the same time, the low-level features of audio are input into the bidirectional long short-term memory (Bi-LSTM) network to further obtain the sequence information of audio features. Finally, the two parts of features are fused and input into the softmax classification function with the center loss function to achieve the recognition of four music emotions. The experimental results based on the emotion music dataset show that the recognition accuracy of the proposed method is 92.06%, and the value of the loss function is about 0.98, both of which are better than other methods. The proposed method provides a new feasible idea for the development of music emotion recognition.

Highlights

  • With the rapid development of the computer network and multimedia technology, more and more multimedia data such as text, image, audio, and video emerge in the internet.e management and analysis of media data have become a hot issue [1]

  • Compared with the traditional emotion classification methods, the innovation of the proposed method is as follows: (1) In order to obtain more comprehensive music features, the proposed method uses the convolutional recurrent neural network (CRNN) and bidirectional long short-term memory (Bi-LSTM) network to extract audio sequence features and context information, respectively, and fuse them for emotion classification to improve the accuracy of emotion recognition

  • Because it uses CRNN and Bi-LSTM to extract music features, respectively, and integrates them into the improved softmax classification function to realize music emotion recognition, the recognition accuracy is well guaranteed. e improved loss function ensures the reduction of the distance within feature classes and increases the discrimination between different classes and enhances the adaptability of finegrained image classification. [14] realizes music emotion recognition based on human physiological characteristics, in which support vector machines, decision trees, K-nearest neighbors, and multilayer perceptrons with different kernels are used to classify music, but because of relying too much on human characteristics, the method in [14] is not sensitive to emotion, the overall classification performance is poor, and the

Read more

Summary

Introduction

With the rapid development of the computer network and multimedia technology, more and more multimedia data such as text, image, audio, and video emerge in the internet. Based on the above analysis, aiming at the problem that single-mode data cannot fully express music emotion, a music emotion recognition method using the convolutional neural network is proposed. (1) In order to obtain more comprehensive music features, the proposed method uses the convolutional recurrent neural network (CRNN) and bidirectional long short-term memory (Bi-LSTM) network to extract audio sequence features and context information, respectively, and fuse them for emotion classification to improve the accuracy of emotion recognition (2) e improved loss function combines the center loss and the distance between classes, which can improve the distinguishability of features, ensure the reduction of the distance within feature classes and increase the discrimination between different classes, and enhance the adaptability of fine-grained image classification

Overall Framework
Audio Low-Level Feature Extraction
CNN Emotion Classification
Local and Sequential Feature Extraction
Sequence Feature Extraction
Softmax Classifier
Introduce Variation of the Center Loss Function
Data Preprocessing
Audio Time Period Selection
Influence of Iteration Times on the
Experimental Comparison of Different
Emotion Classification Confusion Matrix
Experimental Comparison of Different Classification Models
10 Model CNN Improved CNN
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.