Abstract

Generating training sets for Deep Convolutional Neural Networks (DCNNs) is a bottleneck for modern real-world applications. This is a demanding task for applications where annotating training data is costly, such as in semantic segmentation. In the literature, there is still a gap between the performance achieved by a network trained on full and on weak annotations. In this paper, we establish a simple and natural strategy to measure this gap and to identify the components necessary to reduce it. On scribbles, we establish new state-of-the-art results: we obtain a mIoU of 75.6% without, and 75.7% with CRF post-processing. We reduce the gap by 64.2% whereas the current state-of-the-art reduces it only by 57.5%. Thanks to a formal reformulation of the weak supervision problem, a systematic study of the different components involved, and an original experimental strategy, we unravel a counter-intuitive mechanism analog to the philosophy of ensemble learning. This strategy is simple and amenable to generalizations to other weakly-supervised scenarios: averaging poor local predicted annotations with a generic naive baseline and reusing them for training a DCNN yields new state-of-the-art results. We show that our strategy accommodates effortlessly other pixel-level weak annotations such as bounding boxes and remains competitive.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call