Abstract

Despite impressive progress in crowd counting over the last years, it is still an open challenge to reliably count crowds across visual domains. This paper addresses this setting, presenting an unsupervised cross-domain crowd counting framework able to perform unsupervised adaptation across domains with available unlabeled target data. We achieve this by learning to discover bi-knowledge transfer between regression- and detection-based models from a labeled source domain. The dual source knowledge of the two models is heterogeneous and complementary as they capture different modalities of crowd distribution. Specifically, we start by formulating the mutual transformations between the outputs of regression- and detection-based models as two scene-agnostic transformers which enable knowledge transfer between the two models. Given the regression- and detection-based models and their mutual transformers learnt on the source, we then introduce a self-supervised co-training scheme to encourage the knowledge transfer between the two models on the target. We further enhance the model adaptation with our modified mixup augmentation strategy. A thorough benchmark analysis against the most recent cross-domain crowd counting methods and detailed ablation studies show the advantage of our method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call