Abstract

In order to improve the robustness of text detector on scene text of various scales, a single shot text detector that combines local and non-local features is proposed in this paper. A dilated inception module for local feature extraction and a text self-attention module for non-local feature extraction are presented, and these two kinds of modules are integrated into single shot detector (SSD) of generic object detection so as to perform multi-oriented text detection in natural scene. The proposed modules make a contribution to richer and wider receptive field and enhance feature representation. Furthermore, the performance of our text detector is improved. In addition, compared with previous text detectors based on SSD which classify positive and negative samples depending on default boxes, we exploit pixels as reference for more accurate matching with ground truth which avoids complex anchor design. Furthermore, to evaluate the effectiveness of the proposed method, we carry out several comparative experiments on public standard benchmarks and analyze the experimental results in detail. The experimental results illustrate that the proposed text detector can compete with the state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call