Abstract

Natural language is one of the applications of Artificial Intelligence, which trains machines to do the jobs in human language. OCR is one of the fields where the writing efforts are omitted and text images are converted into editable text. An OCR may have post and preprocessing to enhance the text image more suitable for the rest of the OCR process. Thinning is the preprocessing approach in which the characters, words and text is thinned to its one-pixel skeleton. Much of the work has been done in the various languages of the world as well as Pakistani languages. The work on Balti OCR is nonexistent. In this study, a thinning algorithm is proposed for the Balti language, a language spoken in the northern areas of Pakistan and India. Many of the Balti images were tested with the proposed algorithm and the proposed system produced accurate results by giving a one pixel skeleton of input image. The proposed algorithm tested with hundreds of Balti language images and selected results are presented in this paper. The current research has many directions including the way forward to building Balti OCR, Balti ICR (both segmentation based and segmentation free).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.