Abstract
There has been a growing interest in counting crowds through computer vision and machine learning techniques in recent years. Despite that significant progress has been made, most existing methods heavily rely on fully-supervised learning and require a lot of labeled data. To alleviate the reliance, we focus on the semi-supervised learning paradigm. Usually, crowd counting is converted to a density estimation problem. The model is trained to predict a density map and obtains the total count by accumulating densities over all the locations. In particular, we find that there could be multiple density map representations for a given image in a way that they differ in probability distribution forms but reach a consensus on their total counts. Therefore, we propose multiple representation learning to train several models. Each model focuses on a specific density representation and utilizes the count consistency between models to supervise unlabeled data. To bypass the explicit density regression problem, which makes a strong parametric assumption on the underlying density distribution, we propose an implicit density representation method based on the kernel mean embedding. Extensive experiments demonstrate that our approach outperforms state-of-the-art semi-supervised methods significantly.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.