Abstract

Digital libraries are becoming integral part of our day-to-day life. Digitized books and manuscripts in many of these digital libraries are often stored as images or graphics. Very often, they cannot be searched at the content level due to the lack of robust character recognizers. PDF (portable document format) has emerged as one of the most popular document representation schema in digital libraries, especially for storing scanned documents. When there is no textual (UNICODE, ASCII) representation available, scanned images are stored in the graphics stream of PDF. In this paper, we describe a solution to search the textual data in the graphics stream of the PDF files, at the content level. The proposed solution is demonstrated by enhancing an open source PDF viewer (Xpdf). Indian language support is also provided. Users can type a word in Roman (ITRANS), view it in a font, and simultaneously search in textual and graphics stream of PDF.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.