Abstract

The main difficulty of music emotion recognition is the lack of sufficient labeled data. Only the labeled data with unbalanced categories are used to train the emotion recognition model. Not only is accurate labeling of emotion categories costly and time-consuming, but it also requires extensive musical background for labelers At the same time, the emotion of music is often affected by many factors. Singing methods, music styles, arrangement methods, lyrics, and other factors will affect the expression of music emotions. This paper proposes a multimodal method based on the combination of knowledge distillation and music style transfer learning and verifies the effectiveness of the method on 20,000 songs. Experiments show that compared with traditional methods, such as single audio, single lyric, and single audio with multimodal lyric methods, the method proposed in this paper has significantly improved the accuracy of emotion recognition, and the generalization ability has been significantly improved.

Highlights

  • Music works contain rich human emotions, and emotions play an indispensable role in the transmission of musical emotions and understanding and appreciation of music [1–3]

  • Deep learning technology has replaced traditional statistical algorithms as the mainstream technology in the field of automatic music emotion recognition [4–6]. e main content of music includes digital audio and lyrics text, and the current research on music emotion recognition mainly focuses on these two aspects

  • Considering that the previous article has analyzed the existing methods of multimodal emotion recognition, compared with traditional single audio or lyrics methods, the multimodal emotion model used in this article has significant advantages

Read more

Summary

Introduction

Music works contain rich human emotions, and emotions play an indispensable role in the transmission of musical emotions and understanding and appreciation of music [1–3]. With a large number of music works, how to recommend suitable music according to different environments and different moods of users has become a hot topic of research in recent years In this context, the automatic recognition technology of music emotion has attracted more and more attention from the industry. E main content of music includes digital audio and lyrics text, and the current research on music emotion recognition mainly focuses on these two aspects. Based on the multimodal music emotion recognition method, combining knowledge distillation and transfer learning, this article attempts to improve the accuracy of music emotion recognition effectively when the amount of labeled data is too small or the emotion categories are not balanced. Erefore, it is used to classify different music emotions. e specific operation methods are as follows: (1) the model divides the music into specific lengths and uses Fourier conversion to frequency domain signals; (2) the model calculates the logarithm of the frequency domain signals and performs the inverse Fourier transform [28–31]

Data Preprocessing
Student-Teacher Model and Transfer Learning Methods
Data Enhancement
Experimental Results and Analysis

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.