Abstract

The goal of automatic music transcription (AMT) is to obtain a high-level symbolic representation of the notes played in a given audio recording. Despite being researched for several decades, current methods are still inadequate for many applications. To boost the accuracy in a music tutoring scenario, we exploit that the score to be played is specified and we only need to detect the differences to the actual performance. In contrast to previous work that uses score information for postprocessing, we employ the score to construct a transcription method that is tailored to the given audio recording. By adapting a score-informed dictionary learning technique as used for source separation, we learn for each score pitch a spectral pattern describing the energy distribution of associated notes in the recording. In this paper, we identify several systematic weaknesses in our previous approach and introduce three extensions to improve its performance. First, we extend our dictionary of spectral templates to a dictionary of variable-length spectrotemporal patterns. Second, we integrate the score information using soft rather than hard constraints, to better take into account that differences from the score indeed occur. Third, we introduce new regularizers to guide the learning process. Our experiments show that these extensions particularly improve the accuracy for identifying extra notes, while the accuracy for correct and missing notes remains at a similar level. The influence of each extension is demonstrated with further experiments.

Highlights

  • A UTOMATIC music transcription (AMT) has been an active research area for several decades and is often considered to be a key technology in music signal processing [1]

  • We introduce a score-informed transcription method to identify missing and extra notes in piano recordings

  • By incorporating score information into the dictionary learning process, our method yields spectral patterns for each pitch that are closely adapted to the given recording

Read more

Summary

Introduction

A UTOMATIC music transcription (AMT) has been an active research area for several decades and is often considered to be a key technology in music signal processing [1]. Its applications range from content-based music retrieval and interactive music interfaces [1] to musicological analysis [2], music education [3] and note-based audio processing [4]. While for certain applications the accuracy of state-of-the-art methods is sufficiently high, current methods still do not reach the sophistication of a transcription made by human experts. Current methods seem to have reached a plateau in performance and it has become increasingly difficult to make significant improvements [1]. Many interesting applications involving AMT technologies remain infeasible.

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.