Abstract

Optical Music Recognition (OMR) is the research field focused on the automatic reading of music from scanned images. Its main goal is to encode the content into a digital and structured format with the advantages that this entails. This discipline is traditionally aligned to a workflow whose first step is the document analysis. This step is responsible of recognizing and detecting different sources of information—e.g. music notes, staff lines and text—to extract them and then processing automatically the content in the following steps of the workflow. One of the most difficult challenges it faces is to provide a generic solution to analyze documents with diverse resolutions. The endless number of existing music sources does not meet a standard that normalizes the data collections, giving complete freedom for a wide variety of image sizes and scales, thereby making this operation unsustainable. In the literature, this question is commonly overlooked and a uniform scale is assumed. In this paper, a machine learning-based approach to estimate the scale of music documents with respect to a reference scale is presented. Our goal is to propose a robust and generalizable method to adapt the input image to the requirements of an OMR system. For this, two goal-directed case studies are included to evaluate the proposed approach over common task within the OMR workflow, comparing the behavior with other state-of-the-art methods. Results suggest that it is necessary to perform this additional step in the first stage of the workflow to correct the scale of the input images. In addition, it is empirically demonstrated that our specialized approach is more promising than image augmentation strategies for the multi-scale challenge.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call