Abstract

In this paper, we propose a workflow and a machine learning model for recognizing handwritten characters on form document. The learning model is based on Convolutional Neural Network (CNN) as a powerful feature extraction and Support Vector Machines (SVM) as a high-end classifier. The proposed method is more efficient than modifying the CNN with complex architecture. We evaluated some SVM and found that the linear SVM using L1 loss function and L2 regularization giving the best performance both of the accuracy rate and the computation time. Based on the experiment results using data from NIST SD 192nd edition both for training and testing, the proposed method which combines CNN and linear SVM using L1 loss function and L2 regularization achieved a recognition rate better than only CNN. The recognition rate achieved by the proposed method are 98.85% on numeral characters, 93.05% on uppercase characters, 86.21% on lowercase characters, and 91.37% on the merger of numeral and uppercase characters. While the original CNN achieves an accuracy rate of 98.30% on numeral characters, 92.33% on uppercase characters, 83.54% on lowercase characters, and 88.32% on the merger of numeral and uppercase characters. The proposed method was also validated by using ten folds cross-validation, and it shows that the proposed method still can improve the accuracy rate. The learning model was used to construct a handwriting recognition system to recognize a more challenging data on form document automatically. The pre-processing, segmentation and character recognition are integrated into one system. The output of the system is converted into an editable text. The system gives an accuracy rate of 83.37% on ten different test form document.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call