Abstract

Weakly supervised semantic segmentation (WSSS) using only image-level labels can greatly reduce the annotation cost and therefore has attracted considerable research interest. However, its performance is still inferior to the fully supervised counterparts. To mitigate the performance gap, we propose a saliency guided self-attention network (SGAN) to address the WSSS problem. The introduced self-attention mechanism is able to capture rich and extensive contextual information but may mis-spread attentions to unexpected regions. In order to enable this mechanism to work effectively under weak supervision, we integrate class-agnostic saliency priors into the self-attention mechanism and utilize class-specific attention cues as an additional supervision for SGAN. Our SGAN is able to produce dense and accurate localization cues so that the segmentation performance is boosted. Moreover, by simply replacing the additional supervisions with partially labeled ground-truth, SGAN works effectively for semi-supervised semantic segmentation as well. Experiments on the PASCAL VOC 2012 and COCO datasets show that our approach outperforms all other state-of-the-art methods in both weakly and semi-supervised settings.

Highlights

  • IntroductionBased upon the fundamental Fully Convolutional Networks (FCNs) [1], various techniques such as dilated convolution [2], spatial pyramid pooling [3], and encoder-decoders [4] have been developed in the last decade

  • Semantic segmentation aims to predict a semantic label for each pixel in an image

  • In order to prevent our SGAN from mis-spreading attentions among objects of different categories, we propose to integrate the class-specific attention cues obtained by the class activation maps (CAMs) method [11] from a classification network as additional supervision

Read more

Summary

Introduction

Based upon the fundamental Fully Convolutional Networks (FCNs) [1], various techniques such as dilated convolution [2], spatial pyramid pooling [3], and encoder-decoders [4] have been developed in the last decade. These techniques gradually improve segmentation accuracy via exploiting extensive contextual information. The above-mentioned methods have achieved high performance in semantic segmentation, they all work under full supervision This supervision manner requires a large amount of pixel-wise annotations for training, which are very expensive and time-consuming

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.