Abstract

In this paper, we present an end-to-end trainable single shot text detector (SSTD) which can deal with arbitrary oriented texts in natural images. The SSTD detects texts in a single forward pass without any post-processing except for a simple non-maximum suppression. We propose a rotational prior box (RPB) mechanism to generate inclined proposals with orientation information, which enables the prior box fits the oriented text region better. The network performs prediction on multiple feature maps with different resolutions to handle text of various sizes and directly outputs bounding boxes. We further develop a dimension clustering strategy to select appropriate shapes for the default boxes that fit the text instances better. The proposed method is evaluated on three public datasets, namely ICDAR2015, MSRA-TD500 and ICDAR2013. Experiment results demonstrate its superiority in terms of effectiveness and efficiency over several state-of-the-art approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call