Adversarial Transformers for Weakly Supervised Object Localization.

Meng Meng,Tianzhu Zhang,Yongdong Zhang,Zhe Zhang,Feng Wu

doi:10.1109/tip.2022.3220055

Abstract

Weakly supervised object localization (WSOL) aims at localizing objects with only image-level labels, which has better scalability and practicability than fully supervised methods. However, without pixel-level supervision, existing methods tend to generate rough localization maps, which hinders localization performance. To alleviate this problem, we propose an adversarial transformer network (ATNet), which aims to obtain a well-learned localization model with pixel-level pseudo labels. The proposed ATNet enjoys several merits. First, we design an object transformer ( G ) that can generate localization maps and pseudo labels effectively and dynamically, and a part transformer ( D ) to accurately discriminate detailed local differences between localization maps and pseudo labels. Second, we propose to train G and D via an adversarial process, where G can generate more accurate localization maps approaching pseudo labels to fool D . To the best of our knowledge, this is the first work to explore transformers with adversarial training to obtain a well-learned localization model for WSOL. Extensive experiments with four backbones on two standard benchmarks demonstrate that our ATNet achieves favorable performance against state-of-the-art WSOL methods. Besides, our adversarial training can provide higher robustness against adversarial attacks.

Full Text