Real time noisy dataset implementation of optical character identification using CNN

S Veni,R Anand,T Shanthi,R.S Sabeenian

doi:10.1504/ijie.2020.10026346

Abstract

Optical character recognition (OCR) is one of the major research problem in real time applications and it is used to recognise all the characters in an image. As English is a universal language, character recognition in English is a challenging task. Deep learning approach is one of the solution for the recognition of optical characters. Aim of this research work is to perform character recognition using convolutional neural network with LeNET architecture. Dataset used in this work is scanned passport dataset for generating all the characters and digits using tesseract. The dataset has training set of 60,795 and testing set of 7,767. Total samples used are 68,562 which is separated by 62 labels. Till now there is no research on predicting all 52 characters and ten digits. The algorithm used in this work is based on deep learning with appropriate some layer which shows significant improvement in accuracy and reduced the error rate. The developed model was experimented with test dataset for prediction and can produce 93.4% accuracy on training, and 86.5% accuracy on the test dataset.

Full Text