Abstract

Because of the large-scale variation, counting in scenes of different densities is an extremely difficult task. In this paper, based on the attention mechanism, we propose a new self-weighted multi-scale fusion network structure named SMFNet to solve the problem of multi-scale changes and can significantly improve the effect of crowd counting in monitoring scene. The proposed SMFNet uses VGG as the backbone network to extract multi-scale features, uses a SMFNet as the neck to fuse multiple-scale features, and uses the atrous spatial pyramid pooling (ASPP) network and ordinary convolution as the head to generate both the attention map and the density map. The attention map highlighting crowd regions in the image contributes to a high-quality density map, and the density map records the crowd distribution. The number of crowd in the image can be obtained by summing the pixel values of the density map. We conduct experiments on three crowd counting datasets and one vehicle counting dataset to show that our proposed SMFNet can improve the state-of-the-art counting methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.