Abstract

Crowd counting using deep convolutional neural networks (CNN) has achieved encouraging progress in recent years. Nevertheless, how to efficiently address the problems of scale variation and complex backgrounds remain a major challenge. For this, we present an innovative Multi-scale Attention Recalibration Network termed MARNet for obtaining more accurate crowd counting. This is achieved mainly by introducing and integrating two significant modules into the proposed model. More concretely, a Feature Pyramid Module (FPM) is first designed to achieve multi-scale feature enhancement by utilizing multiple dilated convolutions with different rates, thus providing rich contextual information for subsequent operations. Besides, to adequately take advantage of these contextual information, a Feature Recalibration Module (FRM) is devised by integrating a Dimension Attention (DA) block with a Region Recalibration (RR) block. The DA block is mainly used for modeling the semantic dependencies between different dimensions of contextual information, while the RR block is responsible for reassigning attention weights for different regions based on the semantic dependencies. By the integration of the above two blocks, the proposed method can be targeted to capture the crowd features for accurately estimating crowd density. Extensive experiments on multiple publicly crowd counting datasets well demonstrate that our method significantly outperforms most existing methods in terms of the counting accuracy and the quality of the generated density map.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call