Multifaceted Approaches to Music Information Retrieval

Cynthia C S Liem

doi:10.4233/uuid:e8a04372-4c55-4b5f-9bc3-aaab73fe649d

Abstract

Music is a multifaceted phenomenon: beyond addressing our auditory channel, the consumption of music triggers further senses. Also in creating and communicating music, multiple modalities are at play. Next to this, it allows for various ways of interpretation: the same musical piece can be performed in different valid ways, and audiences can in their turn have different reception and interpretation reactions towards music. Music is experienced in many different everyday contexts, which are not confined to direct performance and consumption of musical content alone: instead, music frequently is used to contextualize non-musical settings, ranging from audiovisual productions to special situations and events in social communities. Finally, music is a topic under study in many different research fields, ranging from the humanities and social sciences to natural sciences, and—with the advent of the digital age—in engineering as well. In this thesis, we argue that the full potential of digital music data can only be unlocked when considering the multifaceted aspects as mentioned above. Adopting this view, we provide multiple novel studies and methods for problems in the Music Information Retrieval field: the dedicated research field established to deal with the creation of analysis, indexing and access mechanisms to digital music data. A major part of the thesis is formed by novel methods to perform data-driven analyses of multiple recorded music performances. Proposing a top-down approaches investigating similarities and dissimilarities across a corpus of multiple performances of the same piece, we discuss how this information can be used to reveal varying amounts of artistic freedom over the timeline of a musical piece, initially focusing on the analysis of alignment patterns in piano performance. After this, we move to the underexplored field of comparative analysis of orchestral recordings, proposing how differences between orchestral renditions can further be visualized, explained and related to one another by adopting techniques borrowed from visual human face recognition techniques. The other major part of the thesis considers the challenge of auto-suggesting suitable soundtracks for user-generated video. Building on thoughts in Musicology, Media Studies and Music Psychology, we propose a novel prototypical system which explicitly solicits the intended narrative for the video, and employs information from collaborative web resources to establish connotative connections to musical descriptors, followed by audiovisual reranking. To assess what features can relevantly be employed in search engine querying scenarios, we also further investigate what elements in free-form narrative descriptions invoked by production music are stable, revealing connections to linguistic event structure. Further contributions of the thesis consist of extensive positioning of the newly proposed directions in relation to existing work, and known practical end-user stakeholder demands. As we will show, the paradigms and technical work proposed in this thesis managed to push significant steps forward in employing multimodality, allowing for various ways of interpretation and opening doors to viable and realistic multidisciplinary approaches which are not solely driven by a technology push. Furthermore, ways to create concrete impact at the consumer experience side were paved, which can be more deeply acted upon in the near future.

Full Text