Abstract

Recent years, research on automatic music transcription has made significant progress as deep learning techniques have been validated to demonstrate strong performance in complex data applications. Although the existing work is exciting, they all rely on specific domain knowledge to enable the design of model architectures and training modes for different tasks. At the same time, the noise generated in the process of automatic music transcription data collection cannot be ignored, which makes the existing work unsatisfactory. To address the issues highlighted above, we propose an end-to-end framework based on Transformer. Through the encoder-decoder structure, we realize the direct conversion of the spectrogram of the collected piano audio to MIDI output. Further, to remove the impression of environmental noise on transcription quality, we design a training mechanism mixed with white noise to improve the robustness of our proposed model. Our experiments on the classic piano transcription datasets show that the proposed method can greatly improve the quality of automatic music transcription.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call