Abstract

Every type of musical data (audio, symbolic, lyrics, etc.) has its limitations, and cannot always capture all relevant properties of a particular musical category. In contrast to more typical MIR setups where supervised classification models are trained on only one or two types of data, we propose a more diversified approach to music classification and analysis based on six modalities: audio signals, semantic tags inferred from the audio, symbolic MIDI representations, album cover images, playlist co-occurrences, and lyric texts. Some of the descriptors we extract from these data are low-level, while others encapsulate interpretable semantic knowledge that describes melodic, rhythmic, instrumental, and other properties of music. With the intent of measuring the individual impact of different feature groups on different categories, we propose two evaluation criteria based on “non-dominated hypervolumes”: multi-group feature “importance” and “redundancy”. Both of these are calculated after the application of a multi-objective feature selection strategy using evolutionary algorithms, with a novel approach to optimizing trade-offs between both “pure” and “mixed” feature subsets. These techniques permit an exploration of how different modalities and feature types contribute to class discrimination. We use genre classification as a sample research domain to which these techniques can be applied, and present exploratory experiments on two disjoint datasets of different sizes, involving three genre ontologies of varied class similarity. Our results highlight the potential of combining features extracted from different modalities, and can provide insight on the relative significance of different modalities and features in different contexts.

Highlights

  • Musical information can manifest in a variety of different modalities,1 each of which can be of greater or lesser interest to different types of domain experts, and with respect to different purposes

  • We focus in particular on exploring the ability of our novel methodologies to reveal statistical patterns associated with modalities and feature types, some of which may have musicological or psychoacoustic salience, and some not, some of which may be of use in improving automatic classification performance, and some not; either way, it is our hope that such patterns can provide directed motivation for further multidisciplinary investigations

  • This article introduces two novel statistics based on multi-objective evolutionary feature selection and nondominated hypervolumes

Read more

Summary

Introduction

Musical information can manifest in a variety of different modalities, each of which can be of greater or lesser interest to different types of domain experts, and with respect to different purposes. Some modalities encapsulate information that cannot be found in certain other modalities, and some overlap at least partially in what they can reveal. Even in the latter case, certain kinds of information can be more extracted from certain modalities than others, such as the segmentation of individual melodies found in polyphonic music represented in symbolic compared to audio formats. This suggests significant potential gains in combining different modalities in a variety of MIR research areas

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.