Abstract

Scene text detection methods have developed greatly in the past few years. However, due to the limitation of the diversity of the text background of natural scene, the previous methods often failed when detecting more complicated text instances (e.g., super-long text and arbitrarily shaped text). In this paper, a text detection method based on multi -channel bounding box fusion is designed to address the problem. Firstly, the convolutional neural network is used as the basic network for feature extraction, including shallow text feature map and deep semantic text feature map. Secondly, the whole convolutional network is used for upsampling of feature map and fusion of feature map at each layer, so as to obtain pixel-level text and non-text classification results. Then, two independent text detection boxes channels are designed: the boundary box regression channel and get the bounding box directly on the score map channel. Finally, the result is obtained by combining multi-channel boundary box fusion mechanism with the detection box of the two channels. Experiments on ICDAR2013 and ICDAR2015 demonstrate that the proposed method achieves competitive results in scene text detection.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call