Music Annotation and Retrieval using Unlabeled Exemplars: Correlation and Sparse Codes

Ping-Keng Jao,Yi-Hsuan Yang

doi:10.1109/lsp.2015.2433061

Abstract

Tagging music signals with semantic labels such as genres, moods and instruments is important for content-based music retrieval and recommendation. While considerable effort has been made, automatic music annotation is still considered challenging due to the difficulty of extracting good audio features that capture the characteristics of different tags. To address this issue, we present in this letter two exemplar-based approaches that represent the content of a music clip by referring to a large set of unlabeled audio exemplars. The first approach represents a music clip by the set of audio exemplars that is highly correlated with the short-time feature vectors of the clip, whereas the second approach represents a music clip as sparse linear combinations of its short-time feature vectors over the audio exemplars. Music annotation is then performed by learning the relevance of the audio examples to different tags using labeled data. These two approaches effectively capitalize the availability of unlabeled data to explore the commonality of music signals to find out tag-specific acoustic patterns, without domain knowledge and feature design. Evaluation on the CAL10k music genre tagging dataset for tag-based music retrieval shows that, with thousands of unlabeled audio examples randomly drawn from the Million Song Dataset, the proposed approaches lead to remarkably higher precision rates than existing approaches.

Full Text