
In this paper we present a new data-driven method for pixel-level scene text segmentation from a single natural image. Although scene text detection, i.e. producing a text region mask, has been well studied in the past decade, pixel-level text segmentation is still an open problem due to the lack of massive pixel-level labeled data for supervised training. To tackle this issue, we incorporate text region mask as an auxiliary data into this task, considering acquiring large-scale of labeled text region mask is commonly less expensive and time-consuming. To be specific, we propose a mutually guided network which produces a polygon-level mask in one branch and a pixel-level text mask in the other. The two branches' outputs serve as guidance for each other and the whole network is trained via a semi-supervised learning strategy. Extensive experiments are conducted to demonstrate the effectiveness of our mutually guided network, and experimental results show our network outperforms the state-of-the-art in pixel-level scene text segmentation. We also demonstrate the mask produced by our network could improve the text recognition performance besides the trivial image editing application.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call