Abstract

Weakly-supervised semantic segmentation (WSSS) aims to train a semantic segmentation network using weak labels. Recent approaches generate the pseudo-label from the image-level label and then exploit it as a pixel-level supervision in the segmentation network training. A potential drawback of the conventional WSSS approaches is that the pseudo-label cannot accurately express the object regions and their classes, causing a degradation of the segmentation performance. In this paper, we propose a new WSSS technique that trains the segmentation network without relying on the pseudo-label. Key idea of the proposed approach is to train the segmentation network such that the object erased by the segmentation map is not detected by the classification network. From extensive experiments on the PASCAL VOC 2012 benchmark dataset, we demonstrate that our approach is effective in WSSS.

Highlights

  • I MAGE semantic segmentation, a task to classify each pixel among the interested classes, is an important problem with a wide range of applications such as autonomous driving, medical diagnosis, industrial automation, and aerial imaging [1], [2]

  • From numerical experiments on val and test of the PASCAL VOC 2012 semantic segmentation benchmark [17], we show that our approach achieves mean-intersection-over union 65.5% and 65.4% using VGG16-based network and 67.9% and 68.2% using ResNet101-based network, respectively, which are competitive with the state-of-the-arts

  • The proposed approach is a bit similar to the class activation map (CAM)-based approach in the sense that we find out the object regions from the CAM

Read more

Summary

INTRODUCTION

I MAGE semantic segmentation, a task to classify each pixel among the interested classes, is an important problem with a wide range of applications such as autonomous driving, medical diagnosis, industrial automation, and aerial imaging [1], [2]. The class assigned in each pixel of the pseudo-label might not be correct when an image contains multiple objects with distinct classes (see Fig. 1(b)) since the CAMs are spread to unwanted regions outside the foreground objects For these reasons, an approach that trains the semantic segmentation network using the pseudolabel might not achieve the satisfactory performance in many practical scenarios. The class activation mapping technique that finds out the most discriminative object regions has been used to generate a pixel-level pseudo-label from the image-level label [14]. In [29], two-phase learning strategy has been proposed to get a complete region of the foreground objects from the attention maps of two networks The drawback of these approaches is that it is difficult to figure out whether the masked image still contains part of foreground objects or not. Where λcls is the weighting factor for balancing two losses

SALIENCY MAP REFINEMENT
TRAINING OF SEGMENTATION NETWORK
ABLATION STUDIES
D2 D3 D4
Findings
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.