Contrastive and consistent feature learning for weakly supervised object localization and semantic segmentation

Minsong Ki,Youngjung Uh,Wonyoung Lee,Hyeran Byun

doi:10.1016/j.neucom.2021.03.023

Abstract

Weakly supervised learning attempts to construct predictive models by learning with weak supervision. In this paper, we concentrate on weakly supervised object localization and semantic segmentation tasks. Existing methods are limited to focusing on narrow discriminative parts or overextending the activations to less discriminative regions even on backgrounds. To mitigate these problems, we regard the background as an important cue that guides the feature activation to cover the entire object to the right extent, and propose two novel objective functions: 1) contrastive attention loss and 2) foreground consistency loss. Contrastive attention loss draws the foreground feature and its dropped version close together and pushes the dropped foreground feature away from the background feature. Foreground consistency loss favors agreement between layers and provides early layers with a sense of objectness. Using both losses leads to balanced improvements over localization and segmentation accuracy by boosting activations on less discriminative regions but restraining the activation in the target object extent. For better optimizing the above losses, we use the non-local attention blocks to replace channel-pooled attention leading to enhanced attention maps considering the spatial similarity. Finally, our method achieves state-of-the-art localization performance on CUB-200-2011, ImageNet, and OpenImages benchmarks regarding top-1 localization accuracy, MaxBoxAccV2, and PxAP. We also demonstrate the effectiveness of our method in improving segmentation performance measured by mIoU on the PASCAL VOC dataset.

Full Text