Abstract

In a multilingual country like India, a document may contain text words in more than one language. For a multilingual environment, multi lingual Optical Character Recognition (OCR) system is needed to read the multilingual documents. So, it is necessary to identify different language regions of the document before feeding the document to the OCRs of individual language. The objective of this paper is to propose visual clues based procedure to identify Kannada, Hindi and English text portions of the Indian multilingual document.

Highlights

  • Language identification is an important topic in pattern recognition and image processing based automatic document analysis and recognition

  • Language identification may seem to be an elementary and simple issue for humans in the real world, but it is difficult for a machine, primarily because different scripts are made up of different shaped patterns to produce different character sets [4]

  • We focus on the first stage of the multilingualOCR system and present procedures for identification and separation of Kannada, Hindi and English text portions of the multilingual document produced at Karnataka, an Indian state

Read more

Summary

Introduction

Language identification is an important topic in pattern recognition and image processing based automatic document analysis and recognition. Identification of the language in a document image is of primary importance for selection of a specific OCR system processing multi lingual documents [3]. OCR is of special significance for a multi-lingual country like India, where the text portion of the document usually contains information in more than one language. A document containing text information in more than one language is called a multilingual document. For such type of multilingual documents, it is very essential to identify the text language portion of the document, before the analysis of the contents could be made. Individual OCR tools have been developed to deal best with only one

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call