Abstract

In this paper, we propose a novel congested crowd counting network for crowd density estimation, i.e., the Adaptive Multi-scale Context Aggregation Network (MSCANet). MSCANet efficiently leverages the spatial context information to accomplish crowd density estimation in a complicated crowd scene. To achieve this, a multi-scale context learning block, called the Multi-scale Context Aggregation module (MSCA), is proposed to first extract different scale information and then adaptively aggregate it to capture the full scale of the crowd. Employing multiple MSCAs in a cascaded manner, the MSCANet can deeply utilize the spatial context information and modulate preliminary features into more distinguishing and scale-sensitive features, which are finally applied to a 1 × 1 convolution operation to obtain the crowd density results. Extensive experiments on three challenging crowd counting benchmarks showed that our model yielded compelling performance against the other state-of-the-art methods. To thoroughly prove the generality of MSCANet, we extend our method to two relevant tasks: crowd localization and remote sensing object counting. The extension experiment results also confirmed the effectiveness of MSCANet.

Highlights

  • We observed that our Multi-scale Context Aggregation Network (MSCANet) achieved the best performance on mean squared error (MSE) and competitive results on mean absolute error (MAE) compared to the other methods, which verifies the effectiveness of MSCANet

  • We find that the density map generated by MSCANet are very close to the ground truth density maps, which further prove the superiority of our model

  • Our MSCANet achieved the top performance on both the MAE and MSE metrics

Read more

Summary

Introduction

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. Crowd counting is an indispensable component for smart crowd analysis, to count the number of people and describe the crowd distribution. It plays a critical role in many areas, such as video surveillance [1], public security [2], human behavior analysis [3,4], and smart cities [5,6,7]. Due to the frequent occurrence of scale variations and severe occlusions, in addition to the diverse crowd distributions, the task often faces great difficulties to accurately describe the crowd, especially in scenes of overcrowding

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.