Abstract
Scene text detection is a crucial step in end-to-end scene text recognition, a greatly challenging problem in computer vision. This paper proposes a novel scene text detection method that involves superpixel-based stroke feature transform (SSFT) and deep learning based region classification (DLRC). The SSFT is developed for candidate character region (CCR) extraction, which consists in partitioning an input image into several regions via superpixel-based clustering, removing most regions based on predefined criteria satisfied by the characters, and refining the remaining regions to obtain CCRs by computing a stroke width map. The character regions are identified from the CCRs using DLRC, in which several hand-crafted low-level features, i.e., color, texture, and geometric features, and some deep convolution neural network (CNN) based high-level features are first extracted from the regions, and then these features are fused by using two fully connected networks (FCNs) for region classification. In the DLRC step, the deep feature extraction CNN and the feature fusion FCNs are jointly trained. Next, the extracted character regions are merged to form candidate text regions, from which the final scene texts are detected. The proposed method is evaluated on three publicly available datasets: ICDAR2011, ICDAR2013, and street view text. It achieves F -measures of 0.876, 0.885, and 0.631, respectively, which demonstrate the effectiveness of the proposed scene text detection method.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have