Abstract

Text detection is a prerequisite for text recognition in scene images. Previous segmentation-based methods for detecting scene text have already achieved a promising performance. However, these kinds of approaches may produce spurious text instances, as they usually confuse the boundary of dense text instances, and then infer word/text line instances relying heavily on meticulous heuristic rules. We propose a novel Assembling Text Components (AT-text) that accurately detects dense text in scene images. The AT-text localizes word/text line instances in a bottom-up mechanism by assembling a parsimonious component set. We employ a segmentation model that encodes multi-scale text features, considerably improving the classification accuracy of text/non-text pixels. The text candidate components are finely classified and selected via discriminate segmentation results. This allows the AT-text to efficiently filter out false-positive candidate components, and then to assemble the remaining text components into different text instances. The AT-text works well on multi-oriented and multi-language text without complex post-processing and character-level annotation. Compared with the existing works, it achieves satisfactory results and a considerable balance between precision and recall without a large margin in ICDAR2013 and MSRA-TD 500 public benchmark datasets.

Highlights

  • Text reading in scene images is an important task driven by a set of real-world applications, such as image retrieval, license plate recognition, multi-language translation, etc

  • In the past few decades, scene text detection has attracted a large amount of attention and numerous works have been reported [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26]

  • Huang et al [8] designed a Convolutional Neural Network (CNN) classifier with two convolutional layers to predict character components generated by MSERs method

Read more

Summary

Introduction

Text reading in scene images is an important task driven by a set of real-world applications, such as image retrieval, license plate recognition, multi-language translation, etc. Among the existing methods for scene text detection, there are mainly two prevalent types: detection-based [1,2,3,4,5,6,7,8,9,10,11] and segmentation-based [12,13] algorithms The former methods, drawing inspiration from general object detection, design types of anchors to generate candidate regions and filter out false-positive regions to produce accurate bounding boxes for text instances. Segmentation result and sole binarization threshold might result in such embarrassments Such (a) The scene image (b) The text mask (c) The candidate components example of ofdense densescene scenetext textdetection.

Related Works
The Proposed Method
Pipeline
Segmentation
Background”
Background
Assembling
Datasets and Evaluation
The Network Training
The Ablation Study
Methods
Comparison
The performance of our approach on MSRA-TD500
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.