Enhancing Optical Character Recognition on Images with Mixed Text Using Semantic Segmentation

Shruti Patil,Yash Garg,Vijayakumar Varadarajan,Lakhan Maheshwari,Ketan Kotecha,Shrushti Kumbhare,Deepak Dharrao,Rohan Athawade,Pooja Kamat,Supriya Mahadevkar

doi:10.3390/jsan11040063

Shruti Patil, Yash Garg + Show 8 more

Open Access

https://doi.org/10.3390/jsan11040063

Copy DOI

Abstract

Optical Character Recognition has made large strides in the field of recognizing printed and properly formatted text. However, the effort attributed to developing systems that are able to reliably apply OCR to both printed as well as handwritten text simultaneously, such as hand-filled forms, is lackadaisical. As Machine printed/typed text follows specific formats and fonts while handwritten texts are variable and non-uniform, it is very hard to classify and recognize using traditional OCR only. A pre-processing methodology employing semantic segmentation to identify, segment and crop boxes containing relevant text on a given image in order to improve the results of conventional online-available OCR engines is proposed here. In this paper, the authors have also provided a comparison of popular OCR engines like Microsoft Cognitive Services, Google Cloud Vision and AWS recognitions. We have proposed a pixel-wise classification technique to accurately identify the area of an image containing relevant text, to feed them to a conventional OCR engine in the hopes of improving the quality of the output. The proposed methodology also supports the digitization of mixed typed text documents with amended performance. The experimental study shows that the proposed pipeline architecture provides reliable and quality inputs through complex image preprocessing to Conventional OCR, which results in better accuracy and improved performance.

Full Text