Abstract

The objective of unlabeled scene adaptive crowd counting (USACC) is to adapt the crowd counting model to a particular scene by utilizing only a handful of unlabeled images from that scene, rather than considering all the diverse scenarios that may occur in the unknown environment at once. The resolution of this problem facilitates the fast widespread deployment of crowd counting models, mitigating the issue of performance deterioration caused by domain shift. To tackle the USACC problem, we propose a novel method called meta-ensemble learning that incorporates ensemble learning into the meta-learning paradigm. Specifically, we pass the input data through the network with stochasticity multiple times, implicitly creating an ensemble of multiple models, to produce multiple distinct outputs which can be averaged as pseudo labels to adapt the model. In an iteration of offline training, the scene-specific parameters are learned by minimizing the consistency loss between the actual predictions and the pseudo labels generated from a few unlabeled images belonging to that scene. Then, we optimize the model using the remaining labeled images from the same scene to alleviate error accumulation caused by pseudo labels, and thus improve the accuracy of the pseudo labels in subsequent iterations. The training process explicitly simulates the process of adapting to a particular scene during the test. Therefore, the model is able to adapt to the target scene using a handful of unlabeled images by minimizing the consistency loss during the test. Extensive experiments on several benchmarks demonstrate our method surpasses both baselines and state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call