Abstract

Massive digital acquisition and preservation of deteriorating historical and artistic documents is of particular importance due to their value and fragile condition. The study and browsing of such digital libraries is invaluable for scholars in the Cultural Heritage field but requires automatic tools for analyzing and indexing these datasets. We present two completely automatic methods requiring no human intervention: text height estimation and text line extraction. Our proposed methods have been evaluated on a huge heterogeneous corpus of illuminated medieval manuscripts of different writing styles and with various problematic attributes, such as holes, spots, ink bleed-through, ornamentation, background noise, and overlapping text lines. Our experimental results demonstrate that these two new methods are efficient and reliable, even when applied to very noisy and damaged old handwritten manuscripts.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.