Abstract

A Pre-processing is the initial and vital phase in optical character recognition is the Pre-processing. Segmentation deals with the extraction of individual component from a document image. Number of techniques like projection profile, connected components, gaps between characters/components is reported in the literature for component extraction followed by feature extraction and recognition of the individual component. These techniques gives good results if components are isolated but fails if components are touched, shadowed or skewed. A novel technique is required to address such issues to enhance the recognition rate. The problem of segmentation for Roman script cursive handwriting is addressed by various authors but not enough addressed for Indian script especially Devanagari script. This paper is a review which is confined to offline handwritten script domain. It attempt to review various techniques for character segmentation considering touching characters for offline handwritten words in Devanagari script and scripts sharing similar characteristics (like Bangla, Gurumukhi), database used and their accuracy reported in the literature.

Highlights

  • OCR (Optical Character Recognition) is a conversion process which converts printed or handwritten data in the form of image, online or offline into machine encoded form

  • Various techniques are found in number of research papers in offline handwritten character recognition in Latin and other Asian languages but a few papers are available in Devanagari script (Hindi)

  • Conclusion and Future directions Numbers of techniques are proposed for segmentation of text into their constituent components and authors used their respective selfcreated database for testing their proposed technique. This is the major challenge faced by the researchers in optical character recognition due to unavailability of the benchmark database

Read more

Summary

Introduction

OCR (Optical Character Recognition) is a conversion process which converts printed or handwritten data in the form of image, online or offline into machine encoded form. ICR (Intelligent Character Recognition) more precise than OCR as different styles and fonts are made to learn by the computer system with major application as Automated Form processing. It has major advantages in term of speed, accuracy and cost. Segmentation-based or holistic approached are used in literature for the recognition of Devanagari script. Various techniques are found in number of research papers in offline handwritten character recognition in Latin and other Asian languages but a few papers are available in Devanagari script (Hindi). A comprehensive bibliography which includes most relevant papers related to the segmentation of offline Handwritten scripts is added to provide outline for development in the concerned field

Methods
Findings
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call