Abstract

Automatic Character Recognition has wide variety of applications such as automatic postal mail sorting, number plate recognition and automatic form of reader and entering text from PDA's etc. Cursive script's Automatic Character Recognition is a complex process facing unique issues unlike other scripts. Many solutions have been proposed in the literature to solve complexities of cursive scripts character recognition. This paper present a comprehensive literature review of the Optical Character Recognition (OCR) for off-line and on-line character recognition for Urdu, Arabic and Persian languages, based on Hidden Markov Model (HMM). We surveyed all most all significant approaches proposed and concluded future directions of OCR for cursive languages.

Highlights

  • Optical Character Recognition (OCR) converts text images into text file

  • We describe the predominant application of Hidden Markov Model (HMM) given by segmentation-free and segmentation based recognition of cursive script for off-line and on-line handwritten ligatures, words or text lines

  • The proposed system was assessed on the Urdu single character ligatures and attained 98% accuracy rate for the manually generated data and a 96% accuracy rate for the data scanned from several books and magazines

Read more

Summary

INTRODUCTION

Optical Character Recognition (OCR) converts text images into text file. The main objective of OCR is to mimic the reading ability of human being with accuracy and high speed. The cursive nature and forms of letter depending on its position to create words are creating challenges for researcher in the segmentation stage of the character recognition. Urdu language unlike Arabic language has some peculiarities due to more alphabet/letters and some unique properties These peculiarities make OCR in Urdu language more complex and challenging. The Nasta’liq font style adds further to challenges because language is written diagonally with no fixed baseline, no standards for slopes, context sensitivity caused by filled or false loops and character/ligature overlaps (Slimane et al, 2012). Diagonality is introduced by Nasta’liq writing style in Urdu that makes this language more complex for researcher in the field of OCR. The intra-ligature and inter ligature overlapping in Urdu text being Arabic based text add to challenges in the segmentation and recognition (Naz 2013). This technique is not applicable to Nasta'liq, where the ligatures overlap in horizontal/vertical projections and display minor spacing among the lines

HMM BASED CURSIVE SCRIPT CHARACTER RECOGNITION
Statistical features
ADAB database
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call