Recognition-free search in graphics stream of PDF

A R Balasubramanian ,C V Jawahar

doi:10.3233/wdl-120016

Abstract

Digital libraries are becoming integral part of our day-to-day life. Digitized books and manuscripts in many of these digital libraries are often stored as images or graphics. Very often, they cannot be searched at the content level due to the lack of robust character recognizers. PDF (portable document format) has emerged as one of the most popular document representation schema in digital libraries, especially for storing scanned documents. When there is no textual (UNICODE, ASCII) representation available, scanned images are stored in the graphics stream of PDF. In this paper, we describe a solution to search the textual data in the graphics stream of the PDF files, at the content level. The proposed solution is demonstrated by enhancing an open source PDF viewer (Xpdf). Indian language support is also provided. Users can type a word in Roman (ITRANS), view it in a font, and simultaneously search in textual and graphics stream of PDF.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Recognition-free search in graphics stream of PDF

Abstract

Talk to us

Similar Papers

More From: World Digital Libraries-An International Journal

Lead the way for us

Journal: World Digital Libraries-An International Journal	Publication Date: Jan 1, 2008
Citations: 1

Similar Papers

Creating a more productive, clutter-free, paperless office: a primer on scanning, storage and searching of PDF documents on personal computers
L Citrome
International Journal of Clinical Practice | VOL. 62
L CitromeL Citrome
01 Feb 2008
International Journal of Clinical Practice | VOL. 62

MSL: Mining published scientific literature for the extraction and classification of text and images to support IR capabilities
Ahmed Zeeshan ... Zeeshan Saman
Frontiers in Neuroinformatics | VOL. 10
Ahmed Zeeshan, et. al.Ahmed Zeeshan ... Zeeshan Saman
01 Jan 2015
Frontiers in Neuroinformatics | VOL. 10

CheckPDF: Check What is Inside Before Signing a PDF Document
Bhavya Bansal ... Ronak Patel
-
Bhavya Bansal, et. al.Bhavya Bansal ... Ronak Patel
01 Jan 2015
01 Jan 2015

A Multi-method Evaluation of Website Accessibility: Foregrounding User-centred Design, Data Mining and Using a Quantitative Metric

-

11 Dec 2019
11 Dec 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Recognition-free search in graphics stream of PDF

Abstract

Talk to us

Similar Papers

More From: World Digital Libraries-An International Journal