Abstract

There are two main techniques to convert written or printed text into digital format. The first technique is to create an image of written/printed text, but images are large in size so they require huge memory space to store, as well as text in image form cannot be undergo further processes like edit, search, copy, etc. The second technique is to use an Optical Character Recognition (OCR) system. OCR’s can read documents and convert manual text documents into digital text and this digital text can be processed to extract knowledge. A huge amount of Urdu language’s data is available in handwritten or in printed form that needs to be converted into digital format for knowledge acquisition. Highly cursive, complex structure, bi-directionality, and compound in nature, etc. make the Urdu language too complex to obtain accurate OCR results. In this study, supervised learning-based OCR system is proposed for Nastalique Urdu language. The proposed system evaluations under a variety of experimental settings apprehend 98.4% training results and 97.3% test results, which is the highest recognition rate ever achieved by any Urdu language OCR system. The proposed system is simple to implement especially in software front of OCR system also the proposed technique is useful for printed text as well as handwritten text and it will help in developing more accurate Urdu OCR’s software systems in the future.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call