Abstract

The text recognition research in artificial intelli-gence has enabled machines not only to recognize the human spoken languages but also to interpret them. Optical character recognition is a subarea of AI that converts scanned text images into an editable document. The researchers proposed various text recognition techniques to identify cursive and connected scripts written from left to right but their correct recognition is still a challenging problem for the visual methods. The Balochi language is one of them spoken by a significant part of the world population and no research conducted on the recognition this regional language of Pakistan. In this paper, we propose a convolutional neural network based model for Balochi script recognition for non-cursive characters. Our model optimized small VGGNet model and achieved exceptional precision and speed over the state of the art methods of machine learning. We experimented and compared the proposed method with the baseline LeNet model, the results showed the proposed method improved over the baseline method with a precision of 96%. We additionally collected and processed the Balochi characters dataset and made it public to carry further research in the future.

Highlights

  • Manipulation of the scanned document images remained a challenging task for the machines as the images are in pixel format known as raster graphics

  • Text recognition is more straightforward for non-cursive scripts such as the Latin script compared to cursive character recognition

  • The methods mentioned above proposed various solutions for the character recognition of cursive and right to left languages especially considering Arabic, Farsi, Urdu and other languages but we find no research work carried over the bigger spoken Balochi script

Read more

Summary

Introduction

Manipulation of the scanned document images remained a challenging task for the machines as the images are in pixel format known as raster graphics. The techniques of optical character recognition (OCR) transform printed and handwritten data into digital format so the machine can further control and process them. Text recognition is more straightforward for non-cursive scripts such as the Latin script compared to cursive character recognition. The researchers have proposed approaches for the identification of both forms of script. Cursive and connected scripts recognition still needs a lot of attention, since the development of OCR for such scripts is still under research. Character recognition expands its umbrella to transform other spoken and written regional languages around the globe

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call