Learning Consistency From High-Confidence Pseudo-Labels for Weakly Supervised Object Localization

Kangbo Sun,Jie Zhu

doi:10.1109/access.2023.3246259

Abstract

Weakly supervised object localization (WSOL) tasks aim to classify and locate a single object under the supervision of only image-level labels. Pseudo-supervised learning methods have been shown to be effective for WSOL, which divided WSOL tasks into two decoupled subtasks: classification and localization. The decoupled framework has been proven to be effective in improving the performance of the localization subtask, but the predicted localizations are not robust enough due to the noise of pseudo-labels. Based on the assumption that the localization model should have similar predictions on different versions of the same image, we propose an additional refinement stage to learn more consistent localization. Specifically, in the refinement stage, we propose a simple and effective method for evaluating the confidence of pseudo-labels based on classification discrimination, and by learning consistency from high-confidence pseudo-labels, we further refine the localization model to get better localization performance. Besides, in the initialization stage, we propose a mask-based pseudo-label generator to initialize the localization model. We conduct experiments on two benchmark datasets: CUB-200-2011 and ImageNet-1k. Experimental results show that our two-stage approach achieves 94.01% <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">GT-Konwn</i> localization accuracy on the CUB-200-2011 testing dataset, and 65.23% <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">GT-Konwn</i> localization accuracy on the ImageNet-1k validation dataset. Moreover, when directly applied to the pseudo-supervised localization model, our refinement stage could achieve 94.05% and 67.13% <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">GT-Konwn</i> localization accuracy on CUB-200-2011 and ImageNet-1k datasets, respectively, which outperforms the corresponding pseudo-supervised localization model with 3.34% and 2.34% accuracy.

Full Text