Abstract

In spite of extremely challenging, the weakly-supervised semantic segmentation using image-level labels has made encouraging progress in the recent phase. The existing methods mainly adopt two-stage training procedures: a) optimizing class activation map (CAM) produced by the multi-label classification network to generate pseudo ground truth; b) training a conventional fully supervised semantic segmentation network through pseudo ground truth. When optimizing CAM, most advanced methods just consider the problem that CAM can only activate the sparse and discriminative regions for each class. However, since the loss function of the classification task is image-level supervision, classification network is weak in capturing intricate contextual information, which results in another problem that many misclassified regions are activated in CAM. Compared with classification networks, the loss function of semantic segmentation tasks is pixel-level supervision, which makes it better at capturing intricate contextual information. Thus, based on this ability of the segmentation network, we propose an erasing module to erase the misclassified regions in the CAM. Furthermore, to transform the sparse CAM into high-quality dense pseudo ground truth, we apply the proposed hierarchical deep seeded region growing (H-DSRG) on the erased CAM. Finally, we conduct extensive analysis to validate the proposed method. The proposed method achieves 66.8 of mIoU for Pascal voc 2012 val dataset and 67.6 of mIoU for Pascal voc 2012 test dataset, harvesting new state-of-the-art results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call