Abstract

Accurate music tag classification of music clips has been attracting great attention recently, because it allows one to provide various music excerpts, including unpopular ones, to users based on the clips’ acoustic similarities. Given a user’s preferred music, acoustic features are extracted and then fed into the classifier, which outputs the related tag to recommend new music. Furthermore, the accuracy of the tag classifiers can be improved by selecting the best feature subset based on the domain to which the tag belongs. However, recent studies have struggled to evaluate the superiority of various classifiers because they utilize different feature extractors. In this study, to conduct a direct comparison of existing methods of classification, we create 20 music datasets with the same acoustic feature structure. In addition, we propose an effective evolutionary feature selection algorithm to evaluate the effectiveness of feature selection for tag classification. Our experiments demonstrate that the proposed method improves the accuracy of tag classification, and the analysis with multiple datasets provides valuable insights, such as the important features for general music tag classification in target domains.

Highlights

  • A UTOMATIC music tag classification (MTC) is used to find a relevant music tag, such as emotion or genre, for a music excerpt based on its music signal or extracted acoustic features [1]–[3]

  • Our experimental results indicate that an effective FS method can improve the performance of MTC consistently regardless of the underlying setting

  • Since the main objective of this study was to conduct a fair experiment on FS, the acoustic features were extracted using the same feature extractor—MIRtoolbox [7]—which provides a set of integrated features for each music excerpt

Read more

Summary

Introduction

A UTOMATIC music tag classification (MTC) is used to find a relevant music tag, such as emotion or genre, for a music excerpt based on its music signal or extracted acoustic features [1]–[3]. This task can be achieved by training a classifier using music excerpts with relevant tags annotated by a human being. 4) The classifier learns the music excerpts based on the selected acoustic features with improved accuracy because noisy features can be removed through the FS process Regarding application constraints such as response time, the music excerpts can be divided into clips with a specific duration. 2) An acoustic feature extractor is selected to transform the signal information of each music excerpt into a series of statistical values that form the formal dataset for training the classifier. 3) Since relevant acoustic features may vary according to the domain of the tags, an FS algorithm can be employed to improve classification accuracy. 4) The classifier learns the music excerpts based on the selected acoustic features with improved accuracy because noisy features can be removed through the FS process

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call