Unveiling the potential of pre-processing and differentiable binarization thresholds for invoice text detection

Feiran Liang,Jialun Yang,Yu Wu

doi:10.54254/2755-2721/6/20230654

Abstract

Invoice text detection aims to automatically identify key information in invoices, which is one of the representative applications of Optical Character Recognition (OCR) technology and a research hotspot in the computer vision community. The classic traditional OCR method consists of three key steps: preprocessing stage, text detection and text recognition. In the feature extraction stage, it faces problems such as insufficient generalization ability and poor robustness. In this paper, we reveal the great potential of preprocessing strategies in improving OCR accuracy, and propose an invoice text detection method based on DBNet and differentiable binarization thresholds. Specifically, we first introduce color segmentation and local adaptive threshold to improve the preprocessing process, which can effectively suppress the influence of background information and context noise on detection results. In addition, a differentiable binarization threshold is introduced in the feature extraction, which improves the error correction speed in the whole deep learning process. Studies have shown significant improvements in test results after color segmentation or background removal compared to the original image. Therefore, we propose to properly consider the text detection of invoices in the preprocessing stage for improvement.

Full Text