Abstract
Identification and segmentation of the table of contents (TOC) and index pages for the development of a digital library is an obvious task. A digital document library is created to provide a non-labour intensive, cheap and flexible way of storage, representation and management of paper documents in electronic form to facilitate indexing, viewing, printing and extracting the intended portions. Using document image analysis techniques information from the TOC and index pages may be extracted to use in a document database for effective retrieval of the required pieces of information. In this paper, we present fully automatic identification and segmentation of TOC and index pages from scanned documents.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have