Abstract

In practice, text detection is needed for document image recognition, where the images have long text, large text, as well as dense small text areas. Connection Text Proposal Network (CTPN) is a classical model for text detection, but it is challenging for CTPN to detect dense small text areas. To overcome the challenge, a text detection model is proposed based on CTPN in this paper. The proposed model includes the following components: the residual network (ResNet50) and Feature Pyramid Network (FPN) are used to extract the feature layers with both high-level semantic information and shallow detail information; A Bi-directional Long Short-Term Memory (BiLSTM) network is applied to augment the representation of context information by the multi-scale feature layers; The text boxes on each scale are predicted by the feature layer, by which effectively detecting the text areas on various of scales; The ground-truth bounding box of each text box can be matched to the most appropriate anchors using a centralized approach, and the bounding box of text line is obtained by the post-processing method for text line construction. In particular, our experiment focuses on the text detection for Chinese business license. The experimental results show that the proposed model is more effective than the CTPN in terms of generating higher F -score and using less training data, which is only one third of that for the CTPN. Furthermore, the proposed model works well for the images with long text, large text and dense small text areas simultaneously, for which the CTPN fails.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call