Abstract

For the CNN-based density estimation approaches in the field of crowd counting, how to generate a high-quality density map with accurate counting performance and detailed spatial description is still an open question. In this paper, to tackle the aforementioned contradiction, we propose an end-to-end trainable architecture called Elaborate Density Estimation Network for Crowd Counting (EDENet), which can gradually generate high-quality density estimation maps based on distributed supervision. Specifically, EDENet is composed of Feature Extraction Network (FEN), Feature Fusion Network (FFN), Double-Head Network (DHN) and Adaptive Density Fusion Network (ADFN). The FEN adopts VGG as the backbone network and employs Spatial Adaptive Pooling (SAP) to extract coarse-grained features. The FFN can effectively fuse contextual information and localization information for enhancing the spatial description ability of fine-grained features. In the DHN, the Density Attention Module (DAM) can provide attention masks of foreground-background, thereby urging the Density Regression Module (DRM) to focus on the pixels around the head annotations to regress density maps with different resolutions. The ADFN constructed on the basis of the adaptive weighting mechanism can directly introduce coarse-grained density representation into high-resolution density maps to strengthen the commonality and dependency among density maps. Extensive experiments on four benchmark crowd datasets (the ShanghaiTech, the UCF-QNRF, the JHU-CRWORD++ and the NWPU-Crowd) indicate that EDENet can achieve state-of-the-art recognition performance and high robustness. Not only that, the density map with the highest Peak Signal to Noise Ratio (PSNR) can be considered to be of high quality.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.