Abstract

Weakly supervised object localization (WSOL) aims to localize objects with only image-level labels. As a common WSOL method, adversarial erasing always masks the most discriminative region in the feature space to compel the network to localize more regions of the object. However, with the discriminative region vanishing, the localizer is confused when distinguishing the regions of object from the background. In this paper, we propose a new feature disparity learning (FDL), which encourages the network to learn more distinctive features from the object region with similarity measurement after feature enhancement. Specifically, we first introduce a Spatial Vector Cross Attention (SVCA) module. This module enhances responses in less discriminative region of erased feature maps by reintegrating the spatial distribution of features through the capture of interdependencies among spatial vectors on each channel. Furthermore, we propose a feature complementarity loss to measure the similarity between unerased features and erased features, guiding the network to learn feature disparities caused by adversarial erasing for improved localization and classification. Several experimental studies demonstrate a significant increase in localization performance over the existing state-of-the-art erasing methods on the CUB 200–2011 and ILSVRC 2016 datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call