Abstract

Handwritten text recognition is considered as the most challenging task for the research community due to slight change in different characters’ shape in handwritten documents. The unavailability of a standard dataset makes it vaguer in nature for the researchers to work on. To address these problems, this paper presents an optical character recognition system for the recognition of offline Pashto characters. The problem of the unavailability of a standard handwritten Pashto characters database is addressed by developing a medium-sized database of offline Pashto characters. This database consists of 11352 character images (258 samples for each 44 characters in a Pashto script). Enriched feature extraction techniques of histogram of oriented gradients and zoning-based density features are used for feature extraction of carved Pashto characters. K-nearest neighbors is considered as a classification tool for the proposed algorithm based on the proposed feature sets. A resultant accuracy of 80.34% is calculated for the histogram of oriented gradients, while for zoning-based density features, 76.42% is achieved using 10-fold cross validation.

Highlights

  • In this modern digital age of ever-growing computer technology, the machine learning algorithms play a key role in all fields of life, especially in the areas of text recognition [1], network security [2, 3], privacy [4], traffic flow predictions [5], object detection [6], and may others

  • For the feature extraction purposes, we have proposed HoGs and zoning techniques. ese techniques grab the astute numerical values of the characters. e classification and recognition phase is completed using a k-nearest neighbors (k-NNs) classifier based on the accumulated feature map using HoG and zoning techniques

  • Results are calculated for the proposed system based on a zoning-based density feature set and Histogram of Oriented Gradients (HOGs) feature set

Read more

Summary

Introduction

In this modern digital age of ever-growing computer technology, the machine learning algorithms play a key role in all fields of life, especially in the areas of text recognition [1], network security [2, 3], privacy [4], traffic flow predictions [5], object detection [6], and may others. One of the major applications of machine learning algorithm is Optical Character Recognition (OCR) system development. E Pashto language has incorporated most of the Arabic, Urdu, and Persian letters with some minor modifications. Several research works have been addressed on the automatic recognition of multiple languages such as Arabic, English, Persian, Chinese, and Urdu [7, 8]. Due to this reason of incorporation of letters, the Pashto language is cursive in nature. E Pashto language consists of a large character set (44 characters) greater than Urdu (38 characters), Arabic (28 characters), and Persian (32 characters). Boufenar et al [10, 11] presented an artificial

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call