Abstract

Scene text detection has attracted great interest from the computer vision and pattern recognition communities since text information plays an important role in image indexing and scene understanding. Deep neural networks have become popular for the task of scene text detection, especially for their ability to learn strong text features. However, existing deep learning based state-of-the-art scene text detection methods detect texts only from a single feature map which is unable to capture semantic information at all scales. In this paper, we propose a novel deep learning based model that leverages the pyramid structure of feature maps for accurate scene text detection. We also design a deep convolutional neural network model for non-maximum suppression. In addition, we develop a novel loss function and training method for end-to-end training. The experimental results validate that our end-to-end system is simple, fast, and achieves high accuracy on standard datasets, namely, ICDAR 2015 and MSRA-TD500. We also create a dataset for scene text detection.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call