Abstract

To develop a holistic system for handwritten English character recognition for manually filled forms by systematically synthesising a robust handwritten textual character dataset for acceptable representation of handwriting. As part of this study, 572 copies of a form were filled by over 200 different individuals to introduce demographic variation. These forms were then scanned and each handwritten character in the forms was labelled and extracted using standard image processing techniques. The dataset of 84,712 character images created by this method (HW-dataset) comprised of both alphabetical and numerical characters. Three hybrid datasets (h-EH) were then formed by combining EMNIST datasets and the HW-dataset based on Digits (h-EHd - 329,668 character images), Alphabets (h-EHa - 163,085 character images) and a mixture of Digits and Alphabets (h-EHm - 189,586 character images). An anchor based image extraction technique was used in conjunction with a Multi-Channel CNN (MCCNN) model which was trained on three versions of h-EH, to automate the process of digitization of handwritten forms. The classification accuracies of the MCCNN for h-EHa, h-EHd and h-EHm are 93%, 96% and 93% respectively for test data. Models trained on only the EMNIST dataset perform poorly on test data. An anchor based object detection method used in conjunction with MCCNN trained on h-EH produces excellent results in digitising hand filled forms. Touch free solutions will gain prevalence due to the emergence of threat of fomites in the world. In such a space, manual handling of forms for the purpose of data entry, digitization and information handling will be considered as potential health and safety hazards. The solution presented in the current work uses a combination of models which is trained on a hybrid handwritten data set with high demographic variability. The model developed as part of this study is well suited for enabling touch free handling of documents.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call