Abstract

In this paper, a font size independent Optical Character Recognition (OCR) system for Urdu document images is presented. Urdu documents are written using Noori Nastalique writing style with different font sizes of normal text and headings. Most of current state of the art techniques of Urdu OCRs support recognition of text having single font size. The presented study deals with the recognition of Nastalique text having 14 to 28 font sizes. Three recognizers at three font sizes(called pivot) including 14, 16 and 22 are developed. Urdu document images having remaining font sizes such as 18, 20, 24, 26 and 28 are resized to the nearest pivot font size using Nearest Neighboring interpolation technique so that it can be recognized. The detailed analysis has been carried out to compute optimal scaling factor of each font size to improve recognition results. It has been observed that recognizers perform better at resized images by applying optimal scaling factors instead of simple computed scaling factors. The system is developed and matured on 1,965 main body classes covering 59,974 high frequent Urdu words. After maturation, system has 97.20%, 97.08%, 95.13%, 95.65%, 96.26%, 96.52%, 95.78%, 96.38%, 96.66% main body recognition accuracy for 14, 16, 18, 20, 24, 26, 28 font sizes respectively.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.