Abstract
We propose an end-to-end consistently trainable text detection method based on the Faster R-CNN. The original Faster R-CNN is an end-to-end CNN for fast and accurate object detection. By considering the characteristics of texts, a novel architecture that make use of its ability on object detection is proposed. Although the original Faster R-CNN generates region of interests (RoIs) by a region proposal network (RPN) using the feature map of the last convolutional layer, the proposed method generates RoIs by multiple RPNs using the feature maps of multiple convolutional layers. This method uses multiresolution feature maps to detect texts of various sizes simultaneously. To aggregate the RoIs, we introduce RoI-merge layer, and this layer enables to select valid RoIs from multiple RPNs effectively. In addition, a training strategy is proposed for realizing end-to-end training and making each RPN be specialized in text region size. Experimental results using ICDAR2013/2015 RRC test dataset show that the proposed Multi-RPN method improved detection scores and kept almost the same detection speed as compared to the original Faster R-CNN and recent methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.