Melody-based retrieval in audio collections

Matija Marolt

doi:10.1121/1.2942557

Abstract

Mid-level representations are increasingly used to bridge the gap between high-level (semantic) and low-level audio representations. A mid-level representation that integrates melodic and rhythmic aspects of a music signal is introduced. The representation is formed by first performing multi-pitch detection on consecutive audio frames, and then searching for dominant melodic lines within the detected pitches. Beat-tracking is also performed to yield a beat-synchronous representation, independent of tempo variations. The representation is used as the basis for a melody-based audio retrieval system. An approximate nearest neighbor search algorithm is employed to compare sections of the mid-level representation of the query to indexed sections of mid-level representations of songs in a collection. Symbolic queries may also be used. Results are ranked according to the number of matched sections. A locality sensitive hashing function for cosine similarity is used for indexing, and the point location in equal balls algorithm for searching. Retrieval was tested on the cover song identification task, where the goal is to retrieve different interpretations of a song in a collection. Results on a collection of 2,400 songs are presented. [Work supported by ARRS.]

Full Text