Approaching End-to-End Optical Music Recognition for Homophonic Scores

María Alfaro-Contreras,José M Iñesta,Jorge Calvo-Zaragoza

doi:10.1007/978-3-030-31321-0_13

Abstract

The recognition of patterns that have a time dependency is common in areas like speech recognition or natural language processing. The equivalent situation in image analysis is present in tasks like text or video recognition. Recently, Recurrent Neural Networks (RNN) have been broadly applied to solve these task with good results in an end-to-end fashion. However, its application to Optical Music Recognition (OMR) is not so straightforward due to the presence of different elements at the same horizontal position, disrupting the linear flow of the time line. In this paper we study the ability of the RNNs to learn codes that represent this disruption in homophonic scores. The results prove that our serialized ways of encoding the music content are appropriate for Deep Learning-based OMR and they deserve further study.

Full Text