Abstract

State-of-the-art Optical Music Recognition (OMR) techniques follow an end-to-end or holistic approach, i.e., a sole stage for completely processing a single-staff section image and for retrieving the symbols that appear therein. Such recognition systems are characterized by not requiring an exact alignment between each staff and their corresponding labels, hence facilitating the creation and retrieval of labeled corpora. Most commonly, these approaches consider an agnostic music representation, which characterizes music symbols by their shape and height (vertical position in the staff). However, this double nature is ignored since, in the learning process, these two features are treated as a single symbol. This work aims to exploit this trademark that differentiates music notation from other similar domains, such as text, by introducing a novel end-to-end approach to solve the OMR task at a staff-line level. We consider two Convolutional Recurrent Neural Network (CRNN) schemes trained to simultaneously extract the shape and height information and to propose different policies for eventually merging them at the actual neural level. The results obtained for two corpora of monophonic early music manuscripts prove that our proposal significantly decreases the recognition error in figures ranging between 14.4% and 25.6% in the best-case scenarios when compared to the baseline considered.

Highlights

  • Music is one of the cornerstones of cultural heritage [1]

  • We propose three different end-to-end architectures that basically differ on the point in which the two Convolutional Recurrent Neural Network (CRNN) models are joined: (i) the PreRNN one, which joins the extracted features by each model right before the recurrent block; (ii) the InterRNN one, which performs this process after the first recurrent layer; and (iii) the PostRNN one, which gathers both sources of information after the recurrent block

  • Current state-of-the-art Optical Music Recognition (OMR) technologies, which are based on Convolutional Recurrent Neural Networks (CRNN), typically follow an end-to-end approach that operates at the staff level: they map the series of symbols that appear in an image of a single staff to a sequence of music symbol labels

Read more

Summary

Introduction

Music is one of the cornerstones of cultural heritage [1]. Throughout history, the main means of transmitting and preserving this art has been its engravement in so-called music scores, i.e., documents in which music composers graphically encode a piece of music as well as the way to perform it [2]. Bainbridge and Bell [9] properly described and formalized the de facto standard workflow, which was later thoroughly reviewed by Rebelo et al [10] This sequential pipeline comprises four main blocks: (i) image preprocessing, which aims at palliating problems mostly related to the scanning process and paper quality; (ii) symbol segmentation and classification, which focuses on the detection and actual labeling of different elements of the image meant to be recognized; (iii) reconstruction of the music notation, which postprocesses the recognition process; and (iv) an output encoding stage that stores the recognized elements into a suitable symbolic format. While our proposal focuses on staff-line symbol recognition, it must be noted that research efforts are devoted to addressing the issue of full-page recognition such as the proposal by Castellanos et al [21]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call