Abstract

Many CNN-based methods which utilize the density map to regress the count number of crowd are introduced to solve the crowd counting problem lately. Due to the head scale variations caused by the perspective change and background noise, these methods cannot address these two problems well in highly crowded scenario. In order to solve these two problems, we introduce a multi-scale features fused network with multi-level supervised path to produce the high-quality density map in this paper. Our model utilizes the first 13 layers of VGG16 model as the backbone, the multi-level supervised path in our model employs the multi-level dilated convolution module (MLD) to supervise the whole network at multi-level, and generate the attention map for the density map, which is used to handle the scale variations. The other path is used to fuse multi-scale features to generate the density map with soft spatial-channel attention module (SSCA) which aims to produce a saliency weight map of same size. In the end, the final density map is captured by the feature map multiply the attention map. In addition, a new objective function is proposed to train our network. A large number of experimental results show that compared with other networks, our method achieves better experimental results on four challenging datasets (UCF_CC_50, ShanghaiTech, UCF-QRNF and WorldExpo’10 dataset).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.