Abstract

Weakly supervised semantic segmentation under image-level label supervision has undergone impressive improvements over the past years. These approaches can significantly reduce human annotation efforts, although they remain inferior to fully supervised procedures. In this paper, we propose a novel framework that iteratively refines pixel-level annotations and optimizes segmentation network. We first produce initial deep cues using the combination of activation maps and a saliency map. To produce high-quality pixel-level annotations, a graphical model is constructed over optimal segmentation of high-quality region hierarchies to propagate information from deep cues to unmarked regions. In the training process, the initial pixel-level annotations are used as supervision to train the segmentation network and to predict segmentation masks. To correct inaccurate labels of segmentation masks, we use these segmentation masks with the graphical model to produce accurate pixel-level annotations and use them as supervision to retrain the segmentation network. Experimental results show that the proposed method can significantly outperform the weakly-supervised semantic segmentation methods using static labels. The proposed method has state-of-the-art performance, which are $$66.7\%$$ mIoU score on PASCAL VOC 2012 test set and $$27.0\%$$ mIoU score on MS COCO validation set.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.