Abstract

Identification and segmentation of the table of contents (TOC) and index pages for the development of a digital library is an obvious task. A digital document library is created to provide a non-labour intensive, cheap and flexible way of storage, representation and management of paper documents in electronic form to facilitate indexing, viewing, printing and extracting the intended portions. Using document image analysis techniques information from the TOC and index pages may be extracted to use in a document database for effective retrieval of the required pieces of information. In this paper, we present fully automatic identification and segmentation of TOC and index pages from scanned documents.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call