Abstract

Due to the expensive and laborious annotations of labeled data required by fully-supervised learning in the crowd counting task, it is desirable to explore a method to reduce the labeling burden. There exists a large number of unlabeled images in the wild that can be easily obtained compared to labeled datasets. Based on the characteristics of consistent spatial transformation with the annotations of heads and image, this paper proposes a self-supervised learning framework with unlabeled and limited labeled data for pre-training and fine-tuning crowd counting model (SSL-FT). It includes an online network and a target network that receive the same image but are randomly processed by two defined augmentation transformations. We leverage unlabeled data to pre-train the online network based on a self-supervised loss and small-scale labeled data to transfer the model to a specific domain based on a fully-supervised loss. We demonstrate the effectiveness of the SSL-FT on four public datasets including ShanghaiTech PartA, PartB, UCF-QNRF and WorldExpo'10 utilizing a classical counting model. Experimental results show that our approach performs better than state-of-art semi-supervised methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call