Abstract

Weakly supervised object localization (WSOL) is a challenging task that aims to localize objects in images using only image-level labels. Despite the widespread use of WSOL methods based on class activation mapping (CAM), such methods do not consider that the network may overly focus on local regions of the most interesting objects during localization, thus neglecting the overall information. To address this issue, we introduces an additional attention branch for convolutional neural networks (CNNs) that utilizes the attention mechanism of multi-layer perceptron (MLP) to enhance the network’s learning of global information and supervise the feature learning of CNNs online through knowledge distillation, thereby improving the localization accuracy of WSOL. Specifically, we designs a new loss function using the generated features to combine with contrastive learning, effectively dividing the foreground and background of the image to provide more accurate pseudo-labels for subsequent classification and localization tasks. In the experiments, we tested our method on the CUB-200-2011 dataset and compared it with existing methods. The experimental results show that our method achieves good performance in WSOL tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call