Enhancing Predictability of Handwritten Document Content using HTR and Word Substitution

Computer Science, Vellore University Of Technology, Vellore, India ,Varshini Prakash*,Keshav Moorthy,Jasmin T Jose

doi:10.35940/ijisme.g1240.056720

Computer Science, Vellore University Of Technology, Vellore, India , Varshini Prakash* + Show 2 more

Open Access

https://doi.org/10.35940/ijisme.g1240.056720

Copy DOI

Abstract

Handwritten Text Recognition (HTR) can become progressively abysmal when the documents are damaged with smudges, blemishes and blurs. Recognition of such documents is a challenging task. We, therefore propose a system to identify textual handwritten content in documents where the state-of-the-art Optical Character Recognition (OCR) existing at its full extent performs with low accuracy. By introducing word substitution using character and distance analysis for spell checking and word completion in such areas for giving out more accurate results using a word corpus, we improved our prediction results especially in cases where the OCR is prone to predict false positives on the smudge areas predominantly. Blur detection on every word before segmentation is also substituted with a new word by our OCR algorithm to avoid false positive results and are instead substituted with suitable words. This methodology is far more convenient and reliable since even state-of-the-art HTR technologies do not have more than 71% accuracy. The accuracy of the predicted test is measured using the text similarity metric - Fuzzy Token Set Ratio (FTSR).

Highlights

Smeared documents are those that are hand written or printed hard documents that are exposed to the environment and get destroyed because of foreign objects like liquids, dust and dirt
Optical Character Recognition (OCR) for handwritten documents is still a growing challenge and one way to tackle this problem is by combining it with Natural Language Processing (NLP) for sentence completion until OCR can become mature enough to identify texts from various handwritings, symbols and styles of writing
For Handwritten Text Recognition, we propose a transfer learning methodology approach using an image-based sequence recognition algorithm [12] which runs on six layers of CNNs that help with feature extraction and two layers of RNNs

Summary

INTRODUCTION

Smeared documents are those that are hand written or printed hard documents that are exposed to the environment and get destroyed because of foreign objects like liquids, dust and dirt These foreign objects either cause smudges and blemishes or obfuscate certain characters which cannot be identified using Optical Character Recognition (OCR). OCR for handwritten documents is still a growing challenge and one way to tackle this problem is by combining it with Natural Language Processing (NLP) for sentence completion until OCR can become mature enough to identify texts from various handwritings, symbols and styles of writing. This can take a long time to solve given the various ways in which humans write different characters. There have been significant improvements in segmentation technologies focusing on historical manuscripts but these are very specific and the dataset these have been tested on are finite

EXISTING SYSTEMS

PROPOSED FRAMEWORK

Blur Detection

Handwriting Text Recognition

Word Substitution

THEORETICAL BACKGROUND

AND DISCUSSION

Findings

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Enhancing Predictability of Handwritten Document Content using HTR and Word Substitution

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Innovative Science and Modern Engineering

Lead the way for us

Journal: International Journal of Innovative Science and Modern Engineering	Publication Date: May 15, 2020
License type: cc-by

Similar Papers

Fundamentals in Handwriting Recognition
Sebastiano Impedovo
-
Sebastiano ImpedovoSebastiano Impedovo
01 Jan 1993
01 Jan 1993

How Much Data Do You Need? About the Creation of a Ground Truth for Black Letter and the Effectiveness of Neural OCR
...
-
, et. al. ...
02 Jun 2020
02 Jun 2020

Impact of Deep Learning on Localizing and Recognizing Handwritten Text in Lecture Videos
Lakshmi Haritha Medida ... Kasarapu Ramani
International Journal of Advanced Computer Science and Applications | VOL. 12
Lakshmi Haritha Medida, et. al.Lakshmi Haritha Medida ... Kasarapu Ramani
01 Jan 2020
International Journal of Advanced Computer Science and Applications | VOL. 12

Deep Learning Analysis in Development of Handwritten and Plain Text Classification API
Danny Gani ... Maulahikmah Galinium
-
Danny Gani, et. al.Danny Gani ... Maulahikmah Galinium
21 Sep 2022
21 Sep 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Enhancing Predictability of Handwritten Document Content using HTR and Word Substitution

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Innovative Science and Modern Engineering