Abstract

There is an increasing interest in the automatic digitization of medieval music documents. Despite efforts in this field, the detection of the different layers of information on these documents still poses difficulties. The use of Deep Neural Networks techniques has reported outstanding results in many areas related to computer vision. Consequently, in this paper, we study the so-called Convolutional Neural Networks (CNN) for performing the automatic document processing of music score images. This process is focused on layering the image into its constituent parts (namely, background, staff lines, music notes, and text) by training a classifier with examples of these parts. A comprehensive experimentation in terms of the configuration of the networks was carried out, which illustrates interesting results as regards to both the efficiency and effectiveness of these models. In addition, a cross-manuscript adaptation experiment was presented in which the networks are evaluated on a different manuscript from the one they were trained. The results suggest that the CNN is capable of adapting its knowledge, and so starting from a pre-trained CNN reduces (or eliminates) the need for new labeled data.

Highlights

  • Significant efforts for the preservation of music heritage have occurred in recent decades.The digitization process has significantly improved the access to these sources while ensuring their physical preservation; to make the music contained in these documents truly browsable and searchable, it is necessary to encode the symbolic information into a structured digital format such as MusicXML or Music Encoding Initiative (MEI)

  • We define the document processing as the detection and categorization of the different layers of information contained in the music score image

  • Since the possible combinations of all the different parameters lead to a huge set of different neural networks to be trained per fold, we propose a serialization of the experiments

Read more

Summary

Introduction

Significant efforts for the preservation of music heritage have occurred in recent decades.The digitization process has significantly improved the access to these sources while ensuring their physical preservation; to make the music contained in these documents truly browsable and searchable, it is necessary to encode the symbolic information into a structured digital format such as MusicXML or Music Encoding Initiative (MEI). Each pixel of the image is queried, and its feature block is forwarded and processed by the network, as illustrated in Concerning the CNN configuration, it is still an open question what hyper-parameters (e.g., number of layers, number of filters per layer, size of the filters, etc.) are useful to a greater or lesser extent for this task. This is why we carried out a thorough study of different CNN configurations. The idea is not to find an optimal configuration, which would be unfeasible to demonstrate, but to study what hyper-parameter configurations make a greater difference in performance and computational cost, as well as to analyze the best level of accuracy that can be attained using this approach.

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.