SAL:Selection and Attention Losses for Weakly Supervised Semantic Segmentation

Lei Zhou,Chen Gong,Keren Fu,Zhi Liu

doi:10.1109/tmm.2020.2991592

Abstract

Training a fully supervised semantic segmentation network requires a large amount of expensive pixel-level annotations in manual labor. In this work, we focus on studying the semantic segmentation problem using only image-level supervision. An effective scheme for weakly supervised segmentation is employed to produce the proxy annotations via image tags firstly. Then the segmentation network is retrained on the generated noisy proxy annotations. However, learning from noisy annotations is risky, as proxy annotations of poor quality may deteriorate the performance of the baseline segmentation and classification networks. In order to train the segmentation network using noisy annotations more effectively, two novel loss functions are proposed in this paper, namely, the selection loss and attention loss. Firstly, a selection loss is designed by weighting the proxy annotations based on a coarse-to-fine strategy for evaluating the quality of segmentation masks. Secondly, an attention loss taking the clean image tags as supervision is utilized to correct the classification errors caused by ambiguous pixel-level labels. Finally, we propose an end-to-end semantic segmentation network SAL-Net guided by the above two losses. From the extensive experiments conducted on PASCAL VOC 2012 dataset, SAL-Net reaches state-of-the-art performance with mean IoU (mIoU) as 62.5% and 66.6% on the test set by taking VGG16 network and ResNet101 network as the baselines respectively, which demonstrates the superiority of the proposed algorithm over eight representative weakly supervised segmentation methods. The code and models are available at https://github.com/zmbhou/SALTMM.

Full Text