Abstract

Text detection is the premise and guarantee of text recognition. Multi-oriented text detection is the current research hotspot. Due to the variability in size, spatial layout, color and the arrangement direction of natural scene text, natural scene text detection is still very challenging. Therefore, this paper proposes a simple and fast multi-oriented text detection method. Our method first optimizes the regression branch by designing a diagonal adjustment factor to make the position regression more accurate, which increases F-score by 0.8. Secondly, we add an attention module to the model, which improves the accuracy of detecting small text regions and increases F-score by 1.2. Then, we introduce DR Loss to solve the problem of positive and negative sample imbalance, which increases F-score by 0.5. Finally, we conduct experimental verification and analysis on the ICDAR2015, MSRA-TD500 and ICDAR2013 datasets. The experimental results demonstrate that this method can significantly improve the precision and recall of scene text detection, and it has achieved competitive results compared with existing advanced methods. On the ICDAR 2015 dataset, the proposed method achieves an F-score of 0.849 at 9.9fps at 720p resolution. On the MSRA-TD500 dataset, the proposed method achieves an F-score of 0.772 at 720p resolution. On the ICDAR 2013 dataset, the proposed method achieves an F-score of 0.887 at 720p resolution.

Highlights

  • Scene text detection has important application value and research significance in real-time translation, image retrieval, scene analysis, geographic location and blind navigation

  • Because FCOS [14] has good performance in object detection based on anchor-free prediction, we propose a multi-oriented scene text detection method based on the attention mechanism

  • Different from the above methods, this paper proposes a multi-oriented scene text detection method based on the FCOS

Read more

Summary

INTRODUCTION

Scene text detection has important application value and research significance in real-time translation, image retrieval, scene analysis, geographic location and blind navigation. Traditional text detection methods mainly use machine learning methods for classification. These machine learning methods mainly include neural networks, SVM, K-Means and so on. In complex natural scenes, such as uneven illumination or partial occlusion, the traditional machine learning method still cannot achieve an ideal detection performance. Cao et al.: FDTA: Fully Convolutional Scene Text Detection With Text Attention as SSD [11], Faster-RCNN [12] and YOLO [13] These methods obtain the detection result by regressing the shape of a horizontal rectangular, rotating rectangular and quadrilateral. The segmentation-based methods usually use the idea of semantic segmentation, and divide text pixels into different instances, and obtain text pixel-level positioning results through some post-processing methods. Our main contributions can be summarized as follows: 1)A diagonal adjustment factor is designed to regress the loss function, making the position regression more accurate. 2)The text attention module is added, which makes the text pay more attention to useful information and restrain useless information. 3)Using DR Loss improves the imbalance between positive and negative samples, which further improves network performance. 4)Our method achieves competitive results in terms of speed and accuracy on some standard text detection benchmarks

RELATED WORKS
LOSS FUNCTION
EXPERIMENTS
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.