SEGMENTATION-FREE RECOGNITION OF URDU SCRIPT USING HMM

Prabjot Singh

doi:10.26483/ijarcs.v9i1.5500

Abstract

All the Urdu literature is in the form of manuscripts and typewritten books.There is a need for converting all these physical libraries into electronic libraries. Various OCRs have been developed for different languages and are widely used. Building a complete Urdu OCR is a difficult task because Urdu is highly cursive language, where ligatures overlap and style variation poses challenges to the recognition system. We are describing a technique for automatic recognition of off-line printed Urdu text using Hidden Markov Models. Our method does not require segmentation into characters and considers each shape of Urdu character as different class resulting in a total of 196 classes (compared to 38 Urdu letters). This paper presents a novel feature extraction method based on sliding window technique, using only 16 statistical features from each sliding window thereby eliminating the need for segmentation of Urdu text. The dependency of Recognition rate of Urdu script upon, the number of states of HMM, different sizes of hierarchical window and different fonts is presented. We are using HTK (Hidden Markov Model Toolkit) for training, recognition and result analysis.

Highlights

Optical character recognition, abbreviated as OCR, is the technique that converts scanned images of handwritten, typewritten or printed text into the machine-encoded form that can be processed, edited, searched, saved, and copied for an unlimited number of times without any degradation or loss of information using a computer
4) Results on different Fonts: Five different Urdu fonts were used for recognition and testing.Table4 summarizes the results of Akhbar, Andalus, Naskh and Arial fonts
Future Work: There are many extensions that can be done either to enhance the performance of the system or to make the approach applicable to a wider range of tasks related to Urdu text Recognition

Summary

Introduction

Optical character recognition, abbreviated as OCR, is the technique that converts scanned images of handwritten, typewritten or printed text into the machine-encoded form that can be processed, edited, searched, saved, and copied for an unlimited number of times without any degradation or loss of information using a computer. Segmenting the script into characters is very difficult and complex procedure. It always generates errors, resulting in low recognition rates. The method does not require segmentation into characters and is applied to cursive Urdu script, where ligatures, overlaps and style variation pose challenges to the recognition system. Character recognition for Urdu script faces challenges mainly due to its characteristics like cursive nature, multiple fonts, context-dependent shapes of characters and their position with respect to the baseline. These obstacles have played an important role in delaying character recognition

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

SEGMENTATION-FREE RECOGNITION OF URDU SCRIPT USING HMM

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Research in Computer Science

Lead the way for us

Journal: International Journal of Advanced Research in Computer Science	Publication Date: Feb 20, 2018
License type: cc-by

Similar Papers

Using HMM Toolkit (HTK) for recognition of arabic manuscripts characters
Ahlam Maqqor ... Khalid Satori
-
Ahlam Maqqor, et. al.Ahlam Maqqor ... Khalid Satori
01 Apr 2014
01 Apr 2014

Using features of local densities, statistics and HMM toolkit (HTK) for offline Arabic handwriting text recognition
El Moubtahij Hicham ... Satori Khalid
Journal of Electrical Systems and Information Technology | VOL. 4
El Moubtahij Hicham, et. al.El Moubtahij Hicham ... Satori Khalid
16 Sep 2016
Journal of Electrical Systems and Information Technology | VOL. 4

Arabic Speech Recognition for Connected Words Using HTK: Triphones Expanded to Gmm Based Quran Recognition
Nihal Merad-Boudia ... Abdelkader Benyettou
International Review on Computers and Software (IRECOS) | VOL. 11
Nihal Merad-Boudia, et. al.Nihal Merad-Boudia ... Abdelkader Benyettou
31 Dec 2016
International Review on Computers and Software (IRECOS) | VOL. 11

Radiological image classification using HMMs and Shape contexts
Alaidine Ben Ayed ... Mustapha Kardouchi
-
Alaidine Ben Ayed, et. al.Alaidine Ben Ayed ... Mustapha Kardouchi
01 Jul 2012
01 Jul 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SEGMENTATION-FREE RECOGNITION OF URDU SCRIPT USING HMM

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Research in Computer Science