Abstract

Robust scene text detection is one of the difficult and significant challenges in the computer vision community. Most previous methods detect arbitrary-shaped text using complicated post-processing steps. In this paper, we propose a trainable fast arbitrary-shaped text detection network by using the text discriminator, sharing visual information among the two complementary tasks. Specifically, we extend PSENet [1] by adding a text discriminator to fuse multiple predictions for each text instance, rather than using complicated post-processing steps which are time consuming. The text discriminator shares visual information with text detection network, and thus can achieve much faster detection speed compared with PSENet, while maintaining a similar accuracy reported in PSENet. Furthermore, our text discriminator can reduce the false alarms effectively. Experiments on ICDAR 2017 MLT, ICDAR 2015, and ICDAR 2019 ART datasets demonstrate that the proposed approach can achieve nearly realtime detection speed while keeping state-of-the-art detection accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call