Abstract

Text detection in complex scene images is a challenging task for intelligent transportation. Recently, anchor mechanisms are widely utilized in scene text detection tasks. However, in existing methods, anchors are generally predefined empirically, degrading robustness to complex scenarios with various sizes and orientation variations. In this paper, we propose a novel Attention Anchor Mechanism (AAM), especially targeting at predicting appropriate anchors for each pixel. To be concrete, we regard a series of predefined anchors as basic anchors and utilize an attention model to predict weights corresponding to basic anchors. Consequently, the weighted sum of basic anchors in each pixel can obtain a predicted anchor. In this way, the gap between the predicted anchors and the corresponding ground truth boxes could be narrowed, making the network easier to regress. For facilitating the design of basic anchors, we adopt a dimension-decomposition mechanism to predict width, height, and angle of anchors, respectively. Extensive experiments on several public datasets demonstrate that our method achieves state-of-the-art performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.