Abstract

Offline handwritten text recognition (HTR) for historical documents aims for effective transcription by addressing challenges that originate from the low quality of manuscripts under study as well as from several particularities which are related to the historical period of writing. In this paper, the challenge in HTR is related to a focused goal of the transcription of Greek historical manuscripts that contain several particularities. To this end, in this paper, a convolutional recurrent neural network architecture is proposed that comprises octave convolution and recurrent units which use effective gated mechanisms. The proposed architecture has been evaluated on three newly created collections from Greek historical handwritten documents that will be made publicly available for research purposes as well as on standard datasets like IAM and RIMES. For evaluation we perform a concise study which shows that compared to state of the art architectures, the proposed one deals effectively with the challenging Greek historical manuscripts.

Highlights

  • Offline handwritten text recognition (HTR) in historical documents has become an attractive research field in computer vision, as it enables us to access our written past.The motivation for this work is the analysis of historical texts from the Greek Byzantine literature tradition, spanning between the fourth and the fifteenth century

  • Several challenges are present for HTR systems targeting the specified era, caused by the age of the historical manuscripts that affects the clarity of the writing and the image quality in general

  • The language used in the writing results in increased complexity due to the multitude of diacritics, punctuation and abbreviating symbols that were used, leading to an increased character set compared to modern languages

Read more

Summary

Introduction

Offline handwritten text recognition (HTR) in historical documents has become an attractive research field in computer vision, as it enables us to access our written past. The motivation for this work is the analysis of historical texts from the Greek Byzantine literature tradition, spanning between the fourth and the fifteenth century. The language in these texts is not homogeneous throughout the entire period, an influence of the classical Greek language is prominent. The language used in the writing results in increased complexity due to the multitude of diacritics, punctuation and abbreviating symbols that were used, leading to an increased character set compared to modern languages. The complexity is further increased by the fact that the content of such documents is unconstrained and might have been created by multiple writers

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call