Semi-Supervised Pixel-Level Scene Text Segmentation by Mutually Guided Network.

Shan Zhao,Jue Wang,Shuaicheng Liu,Chuan Wang,Li Zhu,Yanwen Guo,Kunming Luo

doi:10.1109/tip.2021.3113157

Abstract

In this paper we present a new data-driven method for pixel-level scene text segmentation from a single natural image. Although scene text detection, i.e. producing a text region mask, has been well studied in the past decade, pixel-level text segmentation is still an open problem due to the lack of massive pixel-level labeled data for supervised training. To tackle this issue, we incorporate text region mask as an auxiliary data into this task, considering acquiring large-scale of labeled text region mask is commonly less expensive and time-consuming. To be specific, we propose a mutually guided network which produces a polygon-level mask in one branch and a pixel-level text mask in the other. The two branches' outputs serve as guidance for each other and the whole network is trained via a semi-supervised learning strategy. Extensive experiments are conducted to demonstrate the effectiveness of our mutually guided network, and experimental results show our network outperforms the state-of-the-art in pixel-level scene text segmentation. We also demonstrate the mask produced by our network could improve the text recognition performance besides the trivial image editing application.

Full Text