Abstract

In this paper, we present an efficient and robust technique for the recognition of offline roman characters. The main strategy is to extract statistical and similarity features using a combination of grey level co-occurrence matrix (GLCM) and complementary similarity measure (CSM) method. In this work, the CSM method is used to extract features from binary images and combined with GLCM to boost the accuracy of character recognition. The recognition has been done using four different classifiers i.e. artificial neural network (ANN), Naive Bayes classifier, random forest (RF) and support vector machine (SVM). The standard dataset has been used for experimental work. We have done experiments on the clean and noisy dataset. It achieves the accuracy of 100% for some characters without noise and 94:11% with impulsive noise. A comparison of these four classifiers is recorded with and without a noisy environment. On a clean dataset, the random forest provides the best average recognition accuracy of 84:9% for all characters. On low noise datasets, random forest and artificial neural networks have almost the same recognition accuracy and on high noise datasets, SVM provides the highest recognition accuracy.

Highlights

  • Optical character recognition (OCR) is an automated process for reading printed or handwritten text

  • The support vector machine (SVM) outperforms in the high range levels of impulsive noise

  • Four different classifiers viz SVM, Naive Bayes, random forest and neural network are studied for classification of offline alphabetic characters

Read more

Summary

Introduction

Optical character recognition (OCR) is an automated process for reading printed or handwritten text. OCR is an electronic conversion of books text, administrative records, office files, marriage records, security records and many other important printed text into machine encrypted text. This machine encoded text takes less memory space as compared to image and helps in formatting, editing and displaying text properly [1, 2]. The feature extraction is one of the most important step for matching or classification. Feature extraction techniques are divided into two groups: linear and nonlinear. The linear extraction techniques are principal component analysis (PCA) [3], independent component analysis (ICA) [4], linear discrim-

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call