Recognition of Degraded and Non Degraded Roman Characters Using Different Classifiers

Deval Verma,Himanshu Agarwal,A K Aggarwal

doi:10.13189/ujeee.2019.060510

Deval Verma, Himanshu Agarwal + Show 1 more

Open Access

https://doi.org/10.13189/ujeee.2019.060510

Copy DOI

Abstract

In this paper, we present an efficient and robust technique for the recognition of offline roman characters. The main strategy is to extract statistical and similarity features using a combination of grey level co-occurrence matrix (GLCM) and complementary similarity measure (CSM) method. In this work, the CSM method is used to extract features from binary images and combined with GLCM to boost the accuracy of character recognition. The recognition has been done using four different classifiers i.e. artificial neural network (ANN), Naive Bayes classifier, random forest (RF) and support vector machine (SVM). The standard dataset has been used for experimental work. We have done experiments on the clean and noisy dataset. It achieves the accuracy of 100% for some characters without noise and 94:11% with impulsive noise. A comparison of these four classifiers is recorded with and without a noisy environment. On a clean dataset, the random forest provides the best average recognition accuracy of 84:9% for all characters. On low noise datasets, random forest and artificial neural networks have almost the same recognition accuracy and on high noise datasets, SVM provides the highest recognition accuracy.

Highlights

Optical character recognition (OCR) is an automated process for reading printed or handwritten text
The support vector machine (SVM) outperforms in the high range levels of impulsive noise
Four different classifiers viz SVM, Naive Bayes, random forest and neural network are studied for classification of offline alphabetic characters

Summary

Introduction

Optical character recognition (OCR) is an automated process for reading printed or handwritten text. OCR is an electronic conversion of books text, administrative records, office files, marriage records, security records and many other important printed text into machine encrypted text. This machine encoded text takes less memory space as compared to image and helps in formatting, editing and displaying text properly [1, 2]. The feature extraction is one of the most important step for matching or classification. Feature extraction techniques are divided into two groups: linear and nonlinear. The linear extraction techniques are principal component analysis (PCA) [3], independent component analysis (ICA) [4], linear discrim-

Methods

Results

Conclusion