Abstract

Music emotion recognition (MER) field rapidly expanded in the last decade. Many new methods and new audio features are developed to improve the performance of MER algorithms. However, it is very difficult to compare the performance of the new methods because of the data representation diversity and scarcity of publicly available data. In this paper, we address these problems by creating a data set and a benchmark for MER. The data set that we release, a MediaEval Database for Emotional Analysis in Music (DEAM), is the largest available data set of dynamic annotations (valence and arousal annotations for 1,802 songs and song excerpts licensed under Creative Commons with 2Hz time resolution). Using DEAM, we organized the ‘Emotion in Music’ task at MediaEval Multimedia Evaluation Campaign from 2013 to 2015. The benchmark attracted, in total, 21 active teams to participate in the challenge. We analyze the results of the benchmark: the winning algorithms and feature-sets. We also describe the design of the benchmark, the evaluation procedures and the data cleaning and transformations that we suggest. The results from the benchmark suggest that the recurrent neural network based approaches combined with large feature-sets work best for dynamic MER.

Highlights

  • Music emotion recognition (MER) is a young, but fast expanding field, stimulated by the interest from music industry to improve automatic music categorization methods for large-scale online music collections

  • We used two evaluation metrics to compare the performance of different methods: Pearson’s correlation coefficient between the ground truth and predicted values for each song, averaged across songs, and root mean square error (RMSE), averaged the same way

  • RMSE metric measures how far is the prediction of the emotion from the true emotion of the song, and correlation measures whether the direction of change is guessed correctly

Read more

Summary

Introduction

Music emotion recognition (MER) is a young, but fast expanding field, stimulated by the interest from music industry to improve automatic music categorization methods for large-scale online music collections. In addition to these choices, a wide variety of categorical and dimensional emotional models are used, such as basic emotions [4], valence and arousal model [5,6,7,8], Geneva Emotional Music Scales (GEMS). The only other benchmark that exists for MER methods is the audio mood classification (AMC) task, organized by annual Music Information Retrieval Evaluation eXchange (http://www.music-ir.org/mirex/wiki/) (MIREX) [11]. In this task, 600 audio files are provided to the participants of the task, who have agreed not to distribute the files for commercial purposes. The benchmark uses five discrete emotion clusters, derived from cluster analysis of online tags, instead of more widely accepted

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.