OCR-Free Table of Contents Detection in Urdu Books

Adnan Ul-Hasan,Syed Saqib Bukhari,Thomas M Breuel,Faisal Shafait

doi:10.1109/das.2012.59

OCR-Free Table of Contents Detection in Urdu Books

Adnan Ul-Hasan, Syed Saqib Bukhari + Show 2 more

https://doi.org/10.1109/das.2012.59

Copy DOI

Publication Date: Mar 1, 2012

Citations: 14

Affiliation: University of Kaiserslautern, German Research Centre for Artificial Intelligence

#Table Of Contents Pages #OCR Technology + Show 4 more

Abstract
Full-Text
Similar Papers

Abstract

Table of Contents (ToC) is an integral part of multiple-page documents like books, magazines, etc. Most of the existing techniques use textual similarity for automatically detecting ToC pages. However, such techniques may not be applied for detection of ToC pages in situations where OCR technology is not available, which is indeed true for historical documents and many modern Nabataean (Arabic) and Indic scripts. It is, therefore, necessary to develop tools to navigate through such documents without the use of OCR. This paper reports a preliminary effort to address this challenge. The proposed algorithm has been applied to find Table of Contents (ToC) pages in Urdu books and an overall initial accuracy of 88% has been achieved.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.