Abstract

Crowd counting remains a challenging vision task due to the presence of several problems such as severe occlusions, perspective distortions and scale variations in the target scene. How to design an accurate and robust crowd counting estimator has attracted intensive research interest in the past few decades. It is well-known that learning rich features representation is crucial for crowd counting. However, the existing neural-networks-based methods only employ CNN features extracted from the last convolutional layer, and the useful hierarchical information contained in the CNN features is overlooked. To address this problem, we propose a CNN architecture based on the fully convolutional network, which is used to build an end-to-end density map estimation system by combining some of the meaningful convolutional features. Such a combination is exploited to effectively capture both the multi-scale and the multi-level information in complex scenes. Extensive experiments on most existing crowd counting dataset- s including ShanghaiTech Part A, ShanghaiTech Part B and UCF CC 50 demonstrate the effectiveness and the reliability of our approach.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.