Abstract

Accurately estimate the crowd count from a still image with arbitrary perspective and arbitrary crowd density is one of the difficulties of crowd analysis in surveillance videos. Conventional methods are scene-specific and subject to occlusions. In this paper, we propose a Multi-task Multi-column Convolutional Neural Network (MMCNN) architecture for crowd counting and crowd density estimation in still images of surveillance scenes. The MMCNN architecture is an end-to-end system which is robust for images with different perspective and different crowd density. By promoting MCNN with \(3\times 3\) filter, the MMCNN could utilize local spatial features from each column. Furthermore, the ground truth density map is generated based on Perspective-Adaptive Gaussian kernels which can better represent the heads of pedestrians. Finally, we use an iterative switching process in our deep crowd model to alternatively optimize the crowd density map estimation task and crowd counting task. We conduct experiments on the WorldExpo’10 dataset and our method achieves better results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call