Abstract

Handwritten character recognition has been profoundly studied for many years in the field of pattern recognition. Due to its vast practical applications and financial implications, the handwritten character recognition is still an important research area. In this research, a Handwritten Ethiopian Character Recognition (HECR) dataset is prepared to train a model. Images in the HECR dataset were organized with more than one color pen RGB main spaces that are size normalized to 28 × 28 pixels. The dataset is a combination of scripts (Fidel in Ethiopia), numerical representations, punctuations, tonal symbols, combining symbols, and special characters. These scripts have been used to write ancient histories, science, and arts of Ethiopia and Eritrea. In this study, a hybrid model of two super classifiers: Convolutional Neural Network (CNN), as well as eXtreme Gradient Boosting (XGBoost), are proposed for classification. In this integrated model, CNN works as a trainable automatic feature extractor from the raw images and XGBoost takes the extracted features as an input for recognition and classification. The output error rates of the hybrid model and CNN with a fully connected layer are compared. A 0.4630 and 0.1612 error rates were achieved in classifying the handwritten testing dataset images, respectively. The XGBoost as a classifier gave better results than the traditional fully connected layer.

Highlights

  • In the field of pattern recognition, handwritten character recognition has been widely studied for many years

  • The present study mainly focuses on the recognition part of Optical Character Recognition (OCR)

  • As Ethiopian handwritten character recognition has not been studied yet, the main contributions of this study are the following: 1) 502 Ethiopian scripts were collected and ordered in their sequence; 2) as there was no existing offline dataset for Ethiopian scripts, a new dataset is prepared in 28 × 28 pixels that are manually cropped; 3) for the first time, a handwritten dataset that was prepared with more than one color pen is used; and 4) a combined model of Convolutional Neural Network (CNN) and XGBoost is proposed for Ethiopian handwritten scripts recognition

Read more

Summary

INTRODUCTION

In the field of pattern recognition, handwritten character recognition has been widely studied for many years. It takes longer to train even the simplest models To overcome this problem a new algorithm called XGBoost was discovered. Many of the ancient histories, science, and arts of Ethiopia and Eritrea are handwritten documents To preserve these documents, modern commercial Optical Character Recognition (OCR) software is required. As Ethiopian handwritten character recognition has not been studied yet, the main contributions of this study are the following: 1) 502 Ethiopian scripts were collected and ordered in their sequence; 2) as there was no existing offline dataset for Ethiopian scripts, a new dataset is prepared in 28 × 28 pixels that are manually cropped; 3) for the first time, a handwritten dataset that was prepared with more than one color pen is used; and 4) a combined model of CNN and XGBoost is proposed for Ethiopian handwritten scripts recognition.

RELATED WORKS
FEEDFORWARD NETWORKS
MATERIAL AND METHOD
CNN CLASSIFIER
HYBRID CNN-XGBoost MODEL
RESULTS AND DISCUSSIONS
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call