Abstract

This paper introduces a new approach for automatically identifying the temporal origin of the digitized historical documents stored as images on the example from the Balkan region. The approach is based on the concept that differentiation in the orthography style is determined by the evolution of scripts or languages over time. It is characterized by a phase of script coding, mapping the letters of the document into a sequence of numerical codes. Each code is associated with a gray level in the image space. Accordingly, the sequence of numerical codes can be transformed into an image. Then, texture analysis is used on the obtained image for the extraction of the document features. At the end, the feature vector of the document is classified for recognizing its orthography style. An experiment is performed on two databases and on a test collection of historical documents extracted from digitized books in Slavonic–Serbian and Serbian languages written in Cyrillic script and in Croatian recension of the Old Church Slavonic language written in angular Glagolitic script. Obtained results show the efficacy of the proposed approach, its robustness to ‘noisy' documents and its superiority when compared with other approaches using the language or script discrimination for orthography recognition in the literature.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.