Data representations for audio-to-score monophonic music transcription

Miguel A Román,Antonio Pertusa,Jorge Calvo-Zaragoza

doi:10.1016/j.eswa.2020.113769

Miguel A Román, Antonio Pertusa + Show 1 more

Open Access

https://doi.org/10.1016/j.eswa.2020.113769

Copy DOI

Journal: Expert Systems with Applications	Publication Date: Jul 26, 2020
Citations: 16	License type: other-oa

Affiliation: University of Alicante

Abstract

This work presents an end-to-end method based on deep neural networks for audio-to-score music transcription of monophonic excerpts. Unlike existing music transcription methods, which normally perform pitch estimation, the proposed approach is formulated as an end-to-end task that outputs a notation-level music score. Using an audio file as input, modeled as a sequence of frames, a deep neural network is trained to provide a sequence of music symbols encoding a score, including key and time signatures, barlines, notes (with their pitch spelling and duration) and rests. Our framework is based on a Convolutional Recurrent Neural Network (CRNN) with Connectionist Temporal Classification (CTC) loss function trained in an end-to-end fashion, without requiring to align the input frames with the output symbols. A total of 246,870 incipits from the Répertoire International des Sources Musicales online catalog were synthesized using different timbres and tempos to build the training data. Alternative input representations (raw audio, Short-Time Fourier Transform (STFT), log-spaced STFT and Constant-Q transform) were evaluated for this task, as well as different output representations (Plaine & Easie Code, Kern, and a purpose-designed output). Results show that it is feasible to directly infer score representations from audio files and most errors come from music notation ambiguities and metering (time signatures and barlines).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Data representations for audio-to-score monophonic music transcription

Abstract

Talk to us

Similar Papers

More From: Expert Systems with Applications

Lead the way for us

Similar Papers

License Plate Detection and Recognition Based on the YOLO Detector and CRNN-12
Heng Sun ... Wenbo Wang
-
Heng Sun, et. al.Heng Sun ... Wenbo Wang
04 Nov 2018
04 Nov 2018

Offline Persian Handwriting Recognition with CNN and RNN-CTC
Vahid Mohammadi Safarzadeh ... Pourya Jafarzadeh
-
Vahid Mohammadi Safarzadeh, et. al.Vahid Mohammadi Safarzadeh ... Pourya Jafarzadeh
01 Jan 2020
01 Jan 2020

Non-Segmented Chinese License Plate Recognition Algorithm based on Deep neural Networks
Wen-Bin Gong ... Qiang Ji
-
Wen-Bin Gong, et. al.Wen-Bin Gong ... Qiang Ji
01 Aug 2020
01 Aug 2020

Baidu Meizu Deep Learning Competition: Arithmetic Operation Recognition Using End-to-End Learning OCR Technologies
Yuxiang Jiang ... Abdulmotaleb El Saddik
IEEE Access | VOL. 6
Yuxiang Jiang, et. al.Yuxiang Jiang ... Abdulmotaleb El Saddik
01 Jan 2018
IEEE Access | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Data representations for audio-to-score monophonic music transcription

Abstract

Talk to us

Similar Papers

More From: Expert Systems with Applications