Abstract

Abstract Music transcription is a process of creating a notation of musical sounds. It has been used as a basis for the analysis of music from a wide variety of cultures. Recent decades have seen an increasing amount of engineering research within the field of Music Information Retrieval that aims at automatically obtaining music transcriptions in Western staff notation. However, such approaches are not widely applied in research in ethnomusicology. This article aims to bridge interdisciplinary gaps by identifying aspects of proximity and divergence between the two fields. As part of our study, we collected manual transcriptions of traditional dance tune recordings by eighteen transcribers. Our method employs a combination of expert and computational evaluation of these transcriptions. This enables us to investigate the limitations of automatic music transcription (AMT) methods and computational transcription metrics that have been proposed for their evaluation. Based on these findings, we discuss promising avenues to make AMT more useful for studies in the Humanities. These are, first, assessing the quality of a transcription based on an analytic purpose; secondly, developing AMT approaches that are able to learn conventions concerning the transcription of a specific style; thirdly, a focus on novice transcribers as users of AMT systems; and, finally, considering target notation systems different from Western staff notation.

Highlights

  • Music transcription is a process of creating a notation of musical sounds, with music notation being the representation of musical sound through some other medium

  • By comparing ratings of human experts with computational metrics through corpus and close analysis, we documented differences in how the quality of a transcription is assessed in ethnomusicology and in Music Information Retrieval (MIR)

  • Computational metrics are only partially correlated with human ratings

Read more

Summary

Introduction

Music transcription is a process of creating a notation of musical sounds, with music notation being the representation of musical sound through some other medium. Ethnomusicologists can choose an approach that is more or less ‘etic’ (cf ‘phonetic’), whereby whatever is audible is included in case it should turn out to be significant, or ‘emic’ (cf ‘phonemic’), including only those categories and distinctions considered significant in the culture concerned. The former was typical of early investigators working on sound recordings at a distance from the field; later, fieldwork, performance study and collaboration with performers made it possible to distinguish ‘structure’ and ‘details’ on the basis of ‘insider’ knowledge of the musical system in question, and transformed transcription into a representation of performance in cultural context (Ellingson, 1992). Characterized as an ‘unscientific’ procedure (Seeger, 1958), most transcribers have continued to employ staff notation, with or without additional symbols or other modifications (Abraham and von Hornbostel, 1994); automatic graphic representations such as the ‘melogram’ (Seeger, 1958) offer an alternative with both the advantage and the disadvantage that they bypass the interpretive processes of human cognition, such as the ability to distinguish multiple simultaneous streams of sound (Jairazbhoy, 1977)

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.