OCR for devanagari numerals using zonal histogram of angle

Kent Johnson,Kumar Gourav,Gaurav Gaurav,Dwijen Rudrapal,Sanjib Debnath

doi:10.1080/09720510.2017.1395172

Abstract

Optical Character Recognition (OCR) system is used to generate the textual representation of handwritten or printed text. Many research works are going on in the field of OCR over the past few decades for most of the Indian scripts. Devanagari one of the most spoken languages in the world as well as India. Lack of a robust OCR system for Devanagari script is still there even after so much research. The aim of this paper is to make an OCR that could classify handwritten Devanagari numerals. This paper proposes an OCR based on Histogram of the angle made by a dark pixel with the zonal center of mass. This feature bags the angle made by each dark pixel in a zone about its center of mass. This newly extracted feature was used to train various classification algorithms like K-Nearest Neighbor, SVM, Linear SVM, Random Forest, Decision Tree, Gradient Boosting, Gaussian Naive Bayes. We reported an efficiency of each algorithm based on the new feature. Our experiment result shows that the Random Forest Model outperforms over the other algorithms and reports an efficiency of 92.57%.

Full Text