English

Savita Pal Godara,Pratap Singh Patwal

doi:10.14445/22492593/ijcot-v6p308

Abstract

Document image analysis is the process or techniques used for images of documents to obtain a computer-readable description from pixel data. A document image analysis product is the Optical Character Recognition (OCR) software that recognizes text in a scanned document image. OCR makes it possible for the user to edit or search the document's contents. In this paper we proposed a novel method for identification of Latin text from Devanagari script image document. There are many documents in Devanagari where a single document page may contain English text as well with Devanagari. In bilingual documents two scripts are generally mixed together within a single text line. There are existing methods for recognition of both script but methods lack the ability to recognize multiple scripts mixed within a single text line.

Full Text