Abstract

Weakly supervised object localization (WSOL) is a challenging task that aims to localize objects in images using only image-level labels. Despite the widespread use of WSOL methods based on class activation mapping (CAM), such methods do not consider that the network may overly focus on local regions of the most interesting objects during localization, thus neglecting the overall information. To address this issue, we introduces an additional attention branch for convolutional neural networks (CNNs) that utilizes the attention mechanism of multi-layer perceptron (MLP) to enhance the network’s learning of global information and supervise the feature learning of CNNs online through knowledge distillation, thereby improving the localization accuracy of WSOL. Specifically, we designs a new loss function using the generated features to combine with contrastive learning, effectively dividing the foreground and background of the image to provide more accurate pseudo-labels for subsequent classification and localization tasks. In the experiments, we tested our method on the CUB-200-2011 dataset and compared it with existing methods. The experimental results show that our method achieves good performance in WSOL tasks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.