Abstract

OCR (Optical Character Recognition) is a technology in which text image is used to understand and write text by machines. The work on languages containing isolated characters such as German, English, French and others is at its peak. The OCR and ICR (Intelligent Character Recognition) research in Sindhi script is currently at in starting stages and not sufficient work have been cited in this area even though Sindhi language is rich in culture and history. This paper presents one of the initial steps in recognizing Sindhi handwritten characters. The isolated characters of Sindhi script written by the subjects have been recognized. The various subjects were asked to write Sindhi characters in unconstrained form and then the written samples were collected and scanned through a flatbed scanner. The scanned documents were preprocessed with the help of binary conversion, removing noise by pepper noise and the lines were segmented with the help of horizontal profile technique. The segmented lines were used to extract characters from scanned pages. This character segmentation was done by vertical projection. The extracted characters have been used to extract features so that the characters can be classified easily. Zoning was used for the feature extraction technique. For the classification, neural network has been used. The recognized characters converted into editable text with an average accuracy of 85%.

Highlights

  • Optical Mark Recognizer is commonly used in examining candidates in entry test for the universities and other objective type of tests in various job examinations

  • A lot of work has been done on other Sindhi computing but very little work has been done on Sindhi Script and its recognition

  • The work on handwritten character recognition is still in infancy. This is the first step towards the recognition of Sindhi handwritten words and sentences and the Sindhi handwritten text images

Read more

Summary

INTRODUCTION

Optical Mark Recognizer is commonly used in examining candidates in entry test for the universities and other objective type of tests in various job examinations. An OCR is a type of recognition in which text of image is recognized which is considered as the faster method to input the text. Many of the systems such as Sindhi Dictionary [2-3], Sindhi Unicode based word processor [4], Sindhi OCR [57], and Sindhi text image databases have been proposed but to the best of our knowledge no work has been conducted yet on Sindhi handwritten character recognition.The recognition of handwritten Sindhi. Characters is an effort to open a new window of research for the researchers working on pattern recognition and with the minor modification the generalized algorithms can be used with the other languages adopting the Arabic script

RELATED MATERIAL
Peculiarities in Sindhi Character Recognition
Feature Extraction
Recognition
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call