Abstract

We propose three automatic algorithms for analyzing digitized medieval manuscripts, text block computation , text line segmentation , and special component extraction , by taking advantage of previous clustering algorithms and a template-matching technique. These three methods are completely automatic, so no user intervention or input is required to make them work. Moreover, they are all per-page based; that is, unlike some prior methods—that need a set of pages from the same manuscript for training purposes—they are able to analyze a single page without requiring any additional pages for input, eliminating the need for training on additional pages with similar layout. We extensively evaluated the algorithms on 1,771 images of pages of six different publicly available historical manuscripts, which differ significantly from each other in terms of layout structure, acquisition resolution, writing style, and so on. The experimental results indicate that they are able to achieve very satisfactory performance, that is, the average precision and recall values obtained by the text block computation method can reach as high as 98% and 99%, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.