Abstract

Optical music recognition is a research field whose efforts have been mainly focused, due to the difficulties involved in its processes, on document and image recognition. However, there is a final step after the recognition phase that has not been properly addressed or discussed, and which is relevant to obtaining a standard digital score from the recognition process: the step of encoding data into a standard file format. In this paper, we address this task by proposing and evaluating the feasibility of using machine translation techniques, using statistical approaches and neural systems, to automatically convert the results of graphical encoding recognition into a standard semantic format, which can be exported as a digital score. We also discuss the implications, challenges and details to be taken into account when applying machine translation techniques to music languages, which are very different from natural human languages. This needs to be addressed prior to performing experiments and has not been reported in previous works. We also describe and detail experimental results, and conclude that applying machine translation techniques is a suitable solution for this task, as they have proven to obtain robust results.

Highlights

  • As a part of human cultural heritage, musical compositions have been transmitted over the centuries

  • Regarding the statistical machine translation (SMT) method, it should be noted that the results obtained differ noticeably depending on the type of agnostic notation used in the different corpora

  • We have studied the application of machine translation (MT) techniques to performing the encoding step in an optical music recognition pipeline, which had not been properly addressed prior to our study

Read more

Summary

Introduction

As a part of human cultural heritage, musical compositions have been transmitted over the centuries. One of the means of preserving and transmitting such compositions is by visually encoding them in documents called music scores. Given the cost of manual transcription, automatic processing would be preferable, and could be done in the same way as transcribing text from images of documents. Usually referred to as OMR, is a field of research that investigates how to computationally read music notation in documents [1]. Most of the existing literature on OMR is framed within a multi-stage workflow, with steps involving image binarization and staff-line detection and removal [2,3]; symbol classification [4,5]; notation assembly [6,7]; and semantic encoding, this last step being the subject of interest of this work

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call