Abstract

The class-agnostic counting (CAC) problem has garnered significant attention recently due to its broad societal applications and formidable challenges. Existing approaches to counting objects of various categories typically rely on user-provided exemplars, which are challenging to obtain and limit their generality. In this paper, our goal is to empower the framework to recognize adaptive exemplars within entire images. To achieve this, we introduce a zero-shot Generalized Counting Network (GCNet), which utilizes a pseudo-Siamese structure to automatically and efficiently learn pseudo exemplar cues from inherent repetition patterns. In addition, a weakly-supervised scheme is presented to reduce the burden of laborious density maps required by all contemporary CAC models, allowing GCNet to be trained using count-level supervisory signals in an end-to-end manner. Without providing any spatial location hints, GCNet is capable of adaptively capturing them through a carefully-designed self-similarity learning strategy. Extensive experiments and ablation studies on the prevailing benchmark FSC147 for zero-shot CAC demonstrate the superiority of our GCNet. It performs on par with existing exemplar-dependent methods and shows stunning cross-dataset generality on crowd-specific datasets, e.g., ShanghaiTech Part A, Part B and UCF_QNRF.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call