A Novel Hybrid Optical Character Recognition Approach for Digitizing Text in Forms

Roland Graef,Mazen M N Morsy

doi:10.1007/978-3-030-19504-5_14

Abstract

The huge amount of document-based processes has considerably contributed to the need of automated systems which are able to appropriately digitize text in documents concerning forms. For example, the text in scanned administrative forms is not accessible without an adequate conversion from pixels to editable text. Against this background, many organizations tap the potential of Optical Character Recognition (OCR) as it is capable of supporting the digitization of text in documents. However, there is still a lack of integrated OCR approaches, considering both handwritten and machine printed texts, which are both of major importance in the context of digitizing text in forms. To address this problem, we propose a new hybrid OCR approach recognizing handwritten and machine printed text based on neural networks in an integrated perspective. We demonstrate the practical applicability of our approach using publicly available forms on which the approach could be successfully applied. Finally, we evaluate our novel hybrid approach in comparison to existing state-of-the-art approaches.

Full Text