Abstract

With the increase of multi media technology and internet there is a rapid growth in storing and retrieving of documents. Government has taken several methods for documents to scan and stored digitally for future use. Even though the documents are available in the digital format, but it is very difficult to search for a single word or phrase. Traditional optical character recognition techniques (OCR) and other text retrieval methods fail on these document images due to various types of noises. Word spotting will help the users to automatically search for a particular word/phrase in millions of such document images. In this paper we have proposed a word spotting technique for printed Telugu documents. Based on the word spotting technology, a collection of document images is converted into a collection of word images by word segmentation, and a number of profile based features are extracted to represent word images. Correlation and HMM model are applied for comparison of word images. Image to image matching is done by calculating similarities between a query word image and each word image in the collection.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.