Learning From Box Annotations for Referring Image Segmentation.

Guang Feng,Lihe Zhang,Huchuan Lu,Zhiwei Hu

doi:10.1109/tnnls.2022.3201372

Abstract

Referring image segmentation (RIS) has obtained an impressive achievement by fully convolutional networks (FCNs). However, previous RIS methods require a large number of pixel-level annotations. In this article, we present a weakly supervised RIS method by using bounding box (BB) annotations. In the first stage, we introduce an adversarial boundary loss to extract the object contour from the BB, which is then used to select appropriate region proposals for pseudoground-truth (PGT) generation. In the second stage, we design a co-training (Co-T) strategy to purify the pseudolabels. Specifically, we train two networks and interactively guide them to pick clean labels for each other's networks, which can weaken the effect of noisy labels on model training. Experiment results on four benchmark datasets demonstrate that the proposed method can produce high-quality masks with a speed of 63 frames/s.

Full Text