Abstract

Weakly supervised semantic segmentation (WSSS) mainly adopts class activation map (CAM) to recognize different categories with generated pseudo masks of only image-level labels. Recently, many advanced works focus on learning the semantic correlation to refine the conventional CAM, which only identify sparse and discriminative semantic regions, severely weakening the further learning ability of spatial features. To copy with the above problem, a spatial correlation-guided learning framework is proposed to exploit the spatial and semantic correlation between adjacent pixels for weakly supervised fine-grained semantic segmentation (WSFGSS). From the spatial perspective, self-supervised multi-view clustering (SMC) is designed to fully mine spatial correlation by clustering of multiple views including scale, angle and position. Moreover, a hybrid self-supervised (HS) loss function is used to further promote the optimized speed and accuracy of three spatial representations. From the semantic perspective, the affinity matrix is applied to describe the semantic similarity between different pixels by building a weighted graph, and combine above robust pseudo label into a probability transition matrix. Therefore, the initial CAM is gradually corrected during the iterative optimization by the random walk algorithm. Finally, the refined CAM is utilized as supervision information for training the standard segmentation network effectively. Sufficient experimental results on BSDS500, PASCAL VOC 2012 and MS COCO datasets show that the proposed SMC method obtains more accurate pseudo labels than the recent unsupervised segmentation models. Meantime, with these pseudo labels, the proposed fine-grained framework achieves the state-of-the-art performance for WSFGSS.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call