Abstract

Automatic music transcription (AMT) is a critical problem in the field of music information retrieval (MIR). When AMT is faced with deep neural networks, the variety of timbres of different instruments can be an issue that has not been studied in depth yet. The goal of this work is to address AMT transcription by analyzing how timbre affect monophonic transcription in a first approach based on the CREPE neural network and then to improve the results by performing polyphonic music transcription with different timbres with a second approach based on the Deep Salience model that performs polyphonic transcription based on the Constant-Q Transform. The results of the first method show that the timbre and envelope of the onsets have a high impact on the AMT results and the second method shows that the developed model is less dependent on the strength of the onsets than other state-of-the-art models that deal with AMT on piano sounds such as Google Magenta Onset and Frames (OaF). Our polyphonic transcription model for non-piano instruments outperforms the state-of-the-art model, such as for bass instruments, which has an F-score of 0.9516 versus 0.7102. In our latest experiment we also show how adding an onset detector to our model can outperform the results given in this work.

Highlights

  • Automatic Music Transcription (AMT) is the capability of transcribing music audio into music notation and it is one of the main research tasks in the fields of music signal processing and music information retrieval (MIR), and one of the most competitive tasks in music information retrieval evaluation eXchange [1,2]

  • We test the f 0 detection and note tracking algorithm (Algorithm 1) in the Slakh2100 dataset [31] which provides MIDI files and their synthesized audio files, and we compare the performance of this approach with the Onset and Frames (OaF) model [4]

  • It is important to note that the tests have been performed on the same dataset the OaF model has only been trained with the MAESTRO dataset [17] only for piano transcription

Read more

Summary

Introduction

Automatic Music Transcription (AMT) is the capability of transcribing music audio into music notation and it is one of the main research tasks in the fields of music signal processing and music information retrieval (MIR), and one of the most competitive tasks in music information retrieval evaluation eXchange (https://www.music-ir.org/mirex/ wiki/MIREX_HOME, accessed on March 2021) [1,2]. AMT approaches can be organized into four subtasks or categories: multi-pitch estimation (MPE) or frame-level transcription, note-level transcription known as note tracking (NT), stream-level transcription and notation-level transcription. These categories stand for the level of abstraction of the music notation that it is desirable to achieve. There are new approaches that present alternative methods that improve the transcription accuracy by reconstructing the input spectrogram [5] These approaches are trained and tested with piano tracks, but there is a lack of information about the behavior of these models using other instrument families or comparing them with different onset envelopes

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call