Abstract

The two-stage scene text detection algorithms based on Mask R-CNN have achieved good performances on multiple challenging benchmarks. However, their effectiveness is degraded due to artificially setting constant thresholds and low localization quality of candidate boxes. In this paper, we present a novel scene text detection method based on Mask R-CNN and the proposed method, named LOAD, proposes adaptive threshold module and localization quality estimation module to address the above two problems. We propose two kinds of adaptive thresholds which are used for the filtering of candidate boxes and the binarization of pixels respectively. We introduce the self-attention mechanism to obtain the global information for generating the adaptive thresholds. Besides, we introduce the localization quality estimation into our model to obtain more accurate candidate boxes for subsequent segmentation. Comparative experiments are conducted on five benchmarks(ICDAR 2015, ICDAR 2017, MSRA-TD500, Total-Text and CTW1500), and the results demonstrate that the proposed method achieves the state-of-the-art performance with an F-measure of 91.0%, 78.7%, 87.4%, 90.6% and 86.0%. We also provide adequate ablation experiments to demonstrate the effectiveness of the proposed components.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call