Abstract

Automatic music transcription is the process of automatically inferring a high-level symbolic representation, such as music notation or piano-roll, from a music performance. It has many applications in music education, content-based music search, musicological analysis of non-notated music, and music enjoyment. Existing approaches often perform in the frequency domain, where the fundamental time-frequency resolution tradeoff prevents them from obtaining satisfactory transcription accuracies. In this project, we develop a novel approach in the time domain for piano transcription using convolutional sparse coding. It models the music waveform as a summation of piano note waveforms (dictionary elements) convolved with their temporal activations (onsets). The piano note waveforms are pre-recorded in a context-dependent way, i.e., for the specific piano to be transcribed in the specific environment. During transcription, the note waveforms are fixed, and their temporal activations are estimated and post-processed to obtain the pitch and onset transcription. This approach models temporal evolution of piano notes, and estimates pitches and onsets simultaneously in the same framework. Experiments show that it significantly outperforms a state-of-the-art frequency-domain music transcription method trained in the same context-dependent setting, in both transcription accuracy and time precision, in various scenarios including synthetic, anechoic, noisy, and reverberant environments.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call