Abstract
In this paper, we present a robust text detection approach in natural images which is based on region proposal mechanism. A powerful low-level detector named saliency enhanced-MSER extended from the widely-used MSER is proposed by incorporating saliency detection methods, which ensures a high recall rate. Given a natural image, character candidates are extracted from three channels in a perception-based illumination invariant color space by saliency-enhanced MSER algorithm. A discriminative convolutional neural network (CNN) is jointly trained with multi-level information including pixel-level and character-level information as character candidate classifier. Each image patch is classified as strong text, weak text and non-text by double threshold filtering instead of conventional one-step classification, leveraging confident scores obtained via CNN. To further prune non-text regions, we develop a recursive neighborhood search algorithm to track credible texts from weak text set. Finally, characters are grouped into text lines using heuristic features such as spatial location, size, color, and stroke width. We compare our approach with several state-of-the-art methods, and experiments show that our method achieves competitive performance on public datasets ICDAR 2011 and ICDAR 2013.
Highlights
Reading text in the wild is significant in a variety of advanced computer vision applications, such as image and video retrieval, scene understanding and visual assistance, since text in images usually conveys valuable information
We propose a robust approach which combines the advantages of both Maximally Stable Extremal Region (MSER) and convolutional neural network (CNN) feature representations
We evaluated the proposed method on two widely cited datasets for benchmarking scene text detection: ICDAR 2011 RRC dataset [53], and ICDAR 2013 RRC dataset [17]
Summary
Reading text in the wild is significant in a variety of advanced computer vision applications, such as image and video retrieval, scene understanding and visual assistance, since text in images usually conveys valuable information. Detection and recognizing text in scene images has received increasing attention in this community. Though extensively studied in recent years, text detection in unconstrained environments is still quite challenging due to a number of factors, such as high variation in character font, size, color, orientation as well as complicated background and non-uniform illumination. Previous works for scene text detection based on sliding windows [1,2,3,4,5] and connected component analysis [6,7,8,9,10,11,12,13,14] have become mainstream in this domain. Sliding windows based methods localize text regions by shifting a multi-scaled classification window.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.