Abstract

In the past few years, great efforts have been devoted to scene text detection. Nevertheless, efficient text detection in the wild remains a challenging problem. Methods for general object detection usually have limitations in handling the arbitrary orientations and large aspect ratios of scene text. In this paper, we present a novel scene text detection method which treats text detection as a text keypoint detection task performed in a coarse-to-fine scheme (text keypoint detection network, TKDN). Specifically, in TKDN we first generate the coarse text instance regions using feature pyramid network (FPN) as well as region proposal network (RPN) and ResNet50. Within the coarse text regions, we then perform text keypoint detection, bounding box classification and regression, and text region segmentation in a multi-task way. In the inference stage, an effective post-processing algorithm is designed to combine the outputs from three branches and obtain the final text keypoint detection results. The proposed TKDN approach outperforms the state-of-the-art approach and achieves an F-measure of 82.0% on the public-domain ICDAR2015 database.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call