Abstract

Recent years, research on automatic music transcription has made significant progress as deep learning techniques have been validated to demonstrate strong performance in complex data applications. Although the existing work is exciting, they all rely on specific domain knowledge to enable the design of model architectures and training modes for different tasks. At the same time, the noise generated in the process of automatic music transcription data collection cannot be ignored, which makes the existing work unsatisfactory. To address the issues highlighted above, we propose an end-to-end framework based on Transformer. Through the encoder-decoder structure, we realize the direct conversion of the spectrogram of the collected piano audio to MIDI output. Further, to remove the impression of environmental noise on transcription quality, we design a training mechanism mixed with white noise to improve the robustness of our proposed model. Our experiments on the classic piano transcription datasets show that the proposed method can greatly improve the quality of automatic music transcription.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.