Towards closing the gap in weakly supervised semantic segmentation with DCNNs: Combining local and global models

Christoph Mayer,Radu Timofte,Grégory Paul

doi:10.1016/j.cviu.2021.103209

Abstract

Generating training sets for Deep Convolutional Neural Networks (DCNNs) is a bottleneck for modern real-world applications. This is a demanding task for applications where annotating training data is costly, such as in semantic segmentation. In the literature, there is still a gap between the performance achieved by a network trained on full and on weak annotations. In this paper, we establish a simple and natural strategy to measure this gap and to identify the components necessary to reduce it. On scribbles, we establish new state-of-the-art results: we obtain a mIoU of 75.6% without, and 75.7% with CRF post-processing. We reduce the gap by 64.2% whereas the current state-of-the-art reduces it only by 57.5%. Thanks to a formal reformulation of the weak supervision problem, a systematic study of the different components involved, and an original experimental strategy, we unravel a counter-intuitive mechanism analog to the philosophy of ensemble learning. This strategy is simple and amenable to generalizations to other weakly-supervised scenarios: averaging poor local predicted annotations with a generic naive baseline and reusing them for training a DCNN yields new state-of-the-art results. We show that our strategy accommodates effortlessly other pixel-level weak annotations such as bounding boxes and remains competitive.

Full Text