Weakly supervised semantic segmentation using distinct class specific saliency maps

Wataru Shimoda,Keiji Yanai

doi:10.1016/j.cviu.2018.08.006

Abstract

Weakly supervised segmentation has drawn considerable attention, because of the high costs associated with the creation of pixel-wise annotated image datasets that are used for training fully supervised segmentation models. We propose a weakly supervised semantic segmentation method using CNN-based class-specific saliency maps and fully connected CRF. To obtain distinct class-specific saliency maps (DCSM) that can be used as unary potentials of CRF, we propose a novel method of estimating class saliency maps, which significantly improves the method proposed by Simonyan et al. (2014) through the following improvements: (1) using CNN derivatives with respect to feature maps of the intermediate convolutional layers with up-sampling instead of an input image; (2) subtracting the saliency maps of other classes from the saliency maps of the target class to differentiate target objects among other objects; (3) aggregating multiple-scale class saliency maps to compensate for the low resolution in feature maps. In addition, we propose the use of a novel algorithm for estimating segmentation “Easiness” combined with the proposed saliency-based method. Wei et al. (2016) recently demonstrated that a fully supervised segmentation model enhanced the performance of weakly supervised segmentation by training the model using the estimated initial masks in a weakly supervised setting. However, the initial estimated masks tend to include some noise, which sometimes produces erroneous results. Therefore, we focus on improving the quality of the initial estimated masks for training a fully supervised segmentation model. We propose a method for retrieving “good seeds” by predicting the segmentation “Easiness” of images based on the consistency of the outputs under different conditions. We illustrate that our proposed method can retrieve “good seeds”. Despite of the trade-off between training data quality and the number of training images, retrieved images can improve the accuracy of weakly supervised segmentation by combining data augmentation.

Full Text