Abstract
This paper introduces a novel bi-directional con-volutional framework to cope with the large-variance scale problem in scene text detection. Due to the lack of scale normalization in recent CNN-based methods, text instances with large-variance scale are activated inconsistently in feature maps, which makes it hard for CNN-based methods to accurately locate multi-size text instances. Thus, we propose the relationship network (R-Net) that maps multi-scale convolutional features to a scale-invariant space to obtain consistent activation of multi-size text instances. Firstly, we implement an FPN-like backbone with a Spatial Relationship Module (SPM) to extract multi-scale features with powerful spatial semantics. Then, a Scale Relationship Module (SRM) constructed on feature pyramid propagates contextual scale information in sequential features through a bi-directional convolutional operation. SRM supplements the multi-scale information in different feature maps to obtain consistent activation of multi-size text instances. Compared with previous approaches, R-Net effectively handles the large-variance scale problem without complicated post processing and complex hand-crafted hyperparameter setting. Extensive experiments conducted on several benchmarks verify that our R-Net obtains state-of-the-art performance on both accuracy and efficiency. More specifically, R-Net achieves an F-measure of 85.6% at 21.4 frames/s and an F-measure of 81.7% at 11.8 frames/s for ICDAR 2015 and MSRA-TD500 datasets respectively, which is the latest SOTA. The code is available on https://github.com/wangyuxin87/R-Net .
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.