Abstract

EAST [1] is a popular end to end text detector, however, it performs deficiently with long or large texts, because of limited network's feature extraction capacity and narrow receptive fields. Based on EAST, we propose an approach named Enhanced EAST. Firstly, we offer low-level feature layers more semantic information by introducing information from high levels to low levels, which reduces the information gap between different layers. Meanwhile, we utilize a two-stream large kernel convolution to increase receptive fields with reasonable computational cost, therefore, improving the network's features detection and fusion ability. In addition, we also optimize the label generation of training data and design a weighted mask for each text, which can guide the training process to enhance the network's complete shape perception of texts, thus impelling the predicted text boxes locate more accurately. In the end, we perform data equalization and augmentation in the experiments and experiment results on ICDAR 2015, MSRA-TD500 and ICDAR 2017 MLT datasets demonstrate the proposed algorithm achieves a state-of-art performance in multi-oriented scene text detection.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.