DiPS: Discriminative pseudo-label sampling with self-supervised transformers for weakly supervised object localization

Shakeeb Murtaza,Soufiane Belharbi,Marco Pedersoli,Aydin Sarraf,Eric Granger

doi:10.1016/j.imavis.2023.104838

Abstract

Self-supervised vision transformers (SSTs) have shown great potential to yield rich localization maps that highlight different objects in an image. However, these maps remain class-agnostic since the model is unsupervised. They often tend to decompose the image into multiple maps containing different objects while being unable to distinguish the object of interest from background noise objects. In this paper, Discriminative Pseudo-label Sampling (DiPS) is introduced to leverage these class-agnostic maps for weakly-supervised object localization (WSOL), where only image-class labels are available. Given multiple attention maps, DiPS relies on a pre-trained classifier to identify the most discriminative regions of each attention map. This ensures that the selected ROIs cover the correct image object while discarding the background ones, and, as such, provides a rich pool of diverse and discriminative proposals to cover different parts of the object. Subsequently, these proposals are used as pseudo-labels to train our new transformer-based WSOL model designed to perform classification and localization tasks. Unlike standard WSOL methods, DiPS optimizes performance in both tasks by using a transformer encoder and a dedicated output head for each task, each trained using dedicated loss functions. To avoid overfitting a single proposal and promote better object coverage, a single proposal is randomly selected among the top ones for a training image at each training step. Experimental results11Our code is available: https://github.com/shakeebmurtaza/dips on the challenging CUB, ILSVRC, OpenImages, and TelDrone datasets indicate that our architecture, in combination with our transformer-based proposals, can yield better localization performance than state-of-the-art methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

DiPS: Discriminative pseudo-label sampling with self-supervised transformers for weakly supervised object localization

Abstract

Talk to us

Similar Papers

More From: Image and Vision Computing

Lead the way for us

Journal: Image and Vision Computing	Publication Date: Oct 13, 2023
Citations: 4

Similar Papers

Discriminative Sampling of Proposals in Self-Supervised Transformers for Weakly Supervised Object Localization
Shakeeb Murtaza ... Marco Pedersoli
-
Shakeeb Murtaza, et. al.Shakeeb Murtaza ... Marco Pedersoli
01 Jan 2023
01 Jan 2023

ViTOL: Vision Transformer for Weakly Supervised Object Localization
Saurav Gupta ... Rahul Tallamraju
-
Saurav Gupta, et. al.Saurav Gupta ... Rahul Tallamraju
01 Jun 2022
01 Jun 2022

TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization.
Yuan Yao ... Qixiang Ye
IEEE transactions on neural networks and learning systems | VOL. 35
Yuan Yao, et. al.Yuan Yao ... Qixiang Ye
01 Jul 2024
IEEE transactions on neural networks and learning systems | VOL. 35

Weakly supervised object localization via knowledge distillation based on foreground–background contrast
Siteng Ma ... Licheng Jiao
Neurocomputing | VOL. 576
Siteng Ma, et. al.Siteng Ma ... Licheng Jiao
30 Dec 2023
Neurocomputing | VOL. 576

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DiPS: Discriminative pseudo-label sampling with self-supervised transformers for weakly supervised object localization

Abstract

Talk to us

Similar Papers

More From: Image and Vision Computing