Abstract

The huge amount of document-based processes has considerably contributed to the need of automated systems which are able to appropriately digitize text in documents concerning forms. For example, the text in scanned administrative forms is not accessible without an adequate conversion from pixels to editable text. Against this background, many organizations tap the potential of Optical Character Recognition (OCR) as it is capable of supporting the digitization of text in documents. However, there is still a lack of integrated OCR approaches, considering both handwritten and machine printed texts, which are both of major importance in the context of digitizing text in forms. To address this problem, we propose a new hybrid OCR approach recognizing handwritten and machine printed text based on neural networks in an integrated perspective. We demonstrate the practical applicability of our approach using publicly available forms on which the approach could be successfully applied. Finally, we evaluate our novel hybrid approach in comparison to existing state-of-the-art approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call