Abstract

We address the problem of scalable content-based search in large collections of music documents. Music content is highly complex and versatile and presents multiple facets that can be considered independently or in combination. Moreover, music documents can be digitally encoded in many ways. We propose a general framework for building a scalable search engine, based on (i) a music description language that represents music content independently from a specific encoding, (ii) an extendible list of feature-extraction functions, and (iii) indexing, searching, and ranking procedures designed to be integrated into the standard architecture of a text-oriented search engine. As a proof of concept, we also detail an actual implementation of the framework for searching in large collections of XML-encoded music scores, based on the popular ElasticSearch system. It is released as open-source in GitHub, and available as a ready-to-use Docker image for communities that manage large collections of digitized music documents.

Highlights

  • Search engines have become essential components of the digital space

  • We present a list of features that can be produced from a music content descriptor: a Chromatic Interval Feature (CIF), a Diatonic Interval Feature (DIF), a Rhythm Feature (RF), and a Lyric Feature (LF)

  • We presented in this paper a practical approach to the problem of indexing a large library of music documents

Read more

Summary

A Framework for Content-Based Search in Large

Tiange Zhu 1, * , Raphaël Fournier-S’niehotta 1 , Philippe Rigaux 1 and Nicolas Travers 1,2, *. Research Center, Léonard de Vinci Pôle Universitaire, 92400 Paris La Défense, France. Information Retrieval conference (ISMIR19), Delft, The Netherlands, 4–8 November 2019

Introduction
Related Work
The Music Content Model
The Domain of Sounds
Music Content Descriptors
Non-Musical Domains
Polyphonic Music
Offline Operations
Chromatic Interval Feature
Diatonic Interval Feature
Rhythmic Feature
Lyrics Feature
Text-Based Indexing
A Short Discussion
Searching
Scalability
Ranking for Interval-Based Search
Ranking for Rhythmic-Based Search
Finding Matching Occurrences
Implementation
Global Architecture
Query Processing
Distribution and Aggregation
Highlighting
Interacting with the Server
Data and Performance Evaluation
ES servers
Conclusions and Future Work

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.