There have been many proposed music-retrieval systems, based on a variety of principles. How the effectiveness of these systems compares is not clear. The evaluation of some systems has been informal, without the rigor applied in other areas of information retrieval, and comparison of systems is difficult because of the lack of a common data set, queries, or relevance judgments. In this paper we explain how we collected artificial and expert music queries and name-based relevance judgments, and describe software we developed for collection of manual relevance judgments. Together with a collection of downloaded musical instrument digital interface (MIDI) files, these sets of queries and relevance judgments provide valuable tools for measuring music-retrieval systems. As an example of the value of these tools, we use them to compare the effect of using the expert queries and manual judgments to that of the artificial queries and manual judgments used in our earlier experiments.
Read full abstract