Abstract

Due to non-uniform density and variations in scale and perspective, estimating crowd count in crowded scenes in different degree is an extremely challenging task. The deep learning models mostly use pooling operation so that the density map of original resolution is obtained through the last upsampling. This paper aims to solve the problem of losing local spatial information by pooling in density map estimation. Therefore, we propose a dilated convolution neural network with global self-attention, named DCGSA. Especially, we introduce a Global Self-Attention module (GSA) to provide global context as guidance of low-level features to select person location details and a Pyramid Dilated Convolution module (PDC) that extracts channel-wise and pixel-wise features more precisely. Extensive experiments on several crowd datasets show that our method achieves lower crowd counting error and better density maps compared to the recent state-of-the-art methods. In particular, our method also performs well on the sparse dataset UCSD.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call