Abstract
In recent years, images have played a more and more important role in our daily life and social communication. To some extent, the textual information contained in the pictures is an important factor in understanding the content of the scenes themselves. The more accurate the text detection of the natural scenes is, the more accurate our semantic understanding of the images will be. Thus, scene text detection has also become the hot spot in the domain of computer vision. In this paper, we have presented a modified text detection network which is based on further research and improvement of Connectionist Text Proposal Network (CTPN) proposed by previous researchers. To extract deeper features that are less affected by different images, we use Residual Network (ResNet) to replace Visual Geometry Group Network (VGGNet) which is used in the original network. Meanwhile, to enhance the robustness of the models to multiple languages, we use the datasets for training from multi-lingual scene text detection and script identification datasets (MLT) of 2017 International Conference on Document Analysis and Recognition (ICDAR2017). And apart from that, the attention mechanism is used to get more reasonable weight distribution. We found the proposed models achieve 0.91 F1-score on ICDAR2011 test, better than CTPN trained on the same datasets by about 5%.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.