Detect Arbitrary-Shaped Text via Adaptive Thresholding and Localization Quality Estimation

Peirui Cheng,Weiqiang Wang,Yuzhong Zhao

doi:10.1109/tcsvt.2023.3274673

Peirui Cheng, Weiqiang Wang + Show 1 more

https://doi.org/10.1109/tcsvt.2023.3274673

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

The two-stage scene text detection algorithms based on Mask R-CNN have achieved good performances on multiple challenging benchmarks. However, their effectiveness is degraded due to artificially setting constant thresholds and low localization quality of candidate boxes. In this paper, we present a novel scene text detection method based on Mask R-CNN and the proposed method, named LOAD, proposes adaptive threshold module and localization quality estimation module to address the above two problems. We propose two kinds of adaptive thresholds which are used for the filtering of candidate boxes and the binarization of pixels respectively. We introduce the self-attention mechanism to obtain the global information for generating the adaptive thresholds. Besides, we introduce the localization quality estimation into our model to obtain more accurate candidate boxes for subsequent segmentation. Comparative experiments are conducted on five benchmarks(ICDAR 2015, ICDAR 2017, MSRA-TD500, Total-Text and CTW1500), and the results demonstrate that the proposed method achieves the state-of-the-art performance with an F-measure of 91.0%, 78.7%, 87.4%, 90.6% and 86.0%. We also provide adequate ablation experiments to demonstrate the effectiveness of the proposed components.

Full Text