Abstract

In this paper, we propose a weakly supervised semantic segmentation method by directly learning from web images, which are crawled from the Internet by using text queries, without any explicit user annotation or even data filtering. With the goal of handling the massive amount of noisy labels in web images, we design a three-stage approach for weakly-supervised semantic segmentation based on curriculum learning. We first generate pixel-level masks for the training images via a popular weakly-supervised semantic segmentation framework. Then, we consider the noise of the web data in two ways. At the image-level, the complexity of data is measured using its distribution density in a classification feature space. At the pixel-level, the complexity of the mask is evaluated by exploiting the relationship between the saliency map and those segmented images in an unsupervised manner. The key insight to this design is that, common and simple object patterns in images should be salient with both the saliency detector and weakly supervised DCNNs, where they should be sparse with high regional consistency between them. This allows for an efficient implementation of curriculum learning from noisy web images. Experiments on the popular PASCAL VOC 2012 benchmark show that we achieve very competitive performance with scores of 64.0% mIoU using our pure web dataset, which contains noisy, single-label images. We further improve the performance to 69.7% mIoU by using the CurriculumWebSegNet fine-tuned on the PASCAL VOC dataset, which has more precise multi-label supervision.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.