Abstract

This paper studies a new yet practical setting of semi-supervised semantic segmentation, i.e., hybrid-supervised semantic segmentation, where a small number of pixel-level (strong) annotations and a large number of image-level (weak) annotations are provided. It is a common practice to utilize pseudo labels to mitigate the issue of lacking strong annotations. However, most of the existing works focus on improving the model representation with unlabeled data, while ignoring the quality of pseudo labels, leading to poor segmentation performance. It is difficult to directly learn a model with limited images to produce high-quality pseudo labels. To address this problem, we propose a novel learning method, i.e., Transformer based Refinement Learning (TRL), which explores a learning process under the assistance of weak annotations and the supervision of strong annotations. TRL progressively refines heat maps from the poor qualities to the better ones to obtain satisfactory pseudo labels. Specifically, we propose a Dual-Cross Transformer Network (DCTN) to perform the refinement learning. DCTN extracts the features from both images and heat maps by a dual-stream network. The cross attentions inside DCTN hierarchically fuse the dual-stream features. The experiments on the PASCAL VOC and COCO datasets show that TRL outperforms the state-of-the-art methods for hybrid-supervised semantic segmentation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call