Abstract

In this paper, we propose a new method for natural image text detection under a weakly supervised data set. Currently, most of the text detection models are based on bounding box label training data. However, the cost of the bounding box label training data is very high. In order to solve this problem, we propose an attention mechanism that can be trained on image-level labels data and roughly identifies text regions via an automatically learned attentional map based on a convolutional neural network. There are three main steps: firstly, a VGG model is trained using image-level labels data to score the likelihood that a text region exists in the picture; secondly, the region of interest is extracted by means of the attention mechanism and the extracted region is evaluated using the network trained in the first step to getting the text region and finally, the text line is extracted in the text region using the MSER algorithm. Trained with the weakly supervised data which is only with image-level labels, our model can generate bounding boxes for the text line in the image. The results of our model are very close to those of the models using bounding box label training data on the text detection benchmark sets of MSRA-TD500, ICDAR2013, and ICDAR2015.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call