Abstract
To alleviate the burden of labeling data to train crowd counting models, we propose a prototype-based learning approach for semi-supervised crowd counting with an embeded understanding of perspective. Our key idea is that image patches with the same density of people are likely to exhibit coherent appearance changes under similar perspective distortion, but differ significantly under varying distortions. Motivated by this observation, we construct multiple prototypes for each density level to capture variations in perspective. For labeled data, the prototype-based learning assists the regression task by regularizing the feature space and modeling the relationships within and across different density levels. For unlabeled data, the learnt perspective-embedded prototypes enhance differentiation between samples of the same density levels, allowing for a more nuanced assessment of the predictions. By incorporating regression results, we categorize unlabeled samples as reliable or unreliable, applying tailored consistency learning strategies to enhance model accuracy and generalization. Since the perspective information is often unavailable, we propose a novel pseudo-label assigner based on perspective self-organization which requires no additional annotations and assigns image regions to distinct spatial density groups, which mainly reflect the differences in average density among regions. Extensive experiments on four crowd counting benchmarks demonstrate the effectiveness of our approach.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have