Key word spotting using HMM in printed Telugu documents

D Nagasudha,Y Madhavee Latha

doi:10.1109/scopes.2016.7955797

Abstract

With the increase of multi media technology and internet there is a rapid growth in storing and retrieving of documents. Government has taken several methods for documents to scan and stored digitally for future use. Even though the documents are available in the digital format, but it is very difficult to search for a single word or phrase. Traditional optical character recognition techniques (OCR) and other text retrieval methods fail on these document images due to various types of noises. Word spotting will help the users to automatically search for a particular word/phrase in millions of such document images. In this paper we have proposed a word spotting technique for printed Telugu documents. Based on the word spotting technology, a collection of document images is converted into a collection of word images by word segmentation, and a number of profile based features are extracted to represent word images. Correlation and HMM model are applied for comparison of word images. Image to image matching is done by calculating similarities between a query word image and each word image in the collection.

Full Text