An Innovative Approach for Automatic Character Recognition of Indian Languages

doi:10.36348/merjet.2021.v01i01.005

Abstract

evelopment of OCRs for Indian script is an active area of activity today. Optical character recognition (OCR) is the mechanical or electronic translation of images of handwritten, typewritten or printed text (usually captured by a scanner) into machine-editable text. In simple words OCR is a visual recognition process that turns printed or written text into an electronic character-based file. OCR is a field of research in pattern recognition, artificial intelligence and machine vision. Though academic research in the field continues, the focus on OCR has shifted to implementation of proven techniques. A lot of work had been carried out for OCR at international scenario but in Indian context a concrete approach for character recognition is still required as scripts of Indian languages are from the group of most complex scripts and it is very hard to recognize them. Indian scripts present great challenges to an OCR designer due to the large number of letters in the alphabet, the sophisticated ways in which they combine, and the complicated graphemes they result in. The problem is compounded by the unstructured manner in which popular fonts are designed. There is a lot of common structure in the different Indian scripts. All existing OCR systems developed for various Indian scripts do not provide sufficient efficiency due to various factors. The objective of this paper is to discuss a more efficient character recognition technique. This paper introduces a new technical approach to recognize Indian script characters which are unpredictable due to different problems in other OCR‟s.

Full Text