Abstract
Crowd counting in high density crowds has significant importance in crowd safety and crowd management. Existing state-of-the-art methods employ regression models to count the number of people in an image. However, regression models are blind and cannot localize the individuals in the scene. On the other hand, detection-based crowd counting in high density crowds is a challenging problem due to significant variations in scales, poses and appearances. The variations in poses and appearances can be handled through large capacity convolutional neural networks. However, the problem of scale lies in the heart of every detector and needs to be addressed for effective crowd counting. In this paper, we propose a end-to-end scale invariant head detection framework that can handle broad range of scales. We demonstrate that scale variations can be handled by modeling a set of specialized scale-specific convolutional neural networks with different receptive fields. These scale-specific detectors are combined into a single backbone network, where parameters of the network is optimized in end-to-end fashion. We evaluated our framework on challenging benchmark datasets, i.e., UCF-QNRF, UCSD. From experiment results, we demonstrate that proposed framework beats existing methods by a great margin.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.