Abstract

Crowd counting with density estimation has been an active research community due to its significant applications in the fields of public security, video surveillance, traffic monitoring. However, Crowd counting for congested scenes often suffers from some obstacles including severe occlusions, large scale variations, noise interference, etc. In this paper, using the first ten layers of a modified VGG16 and dilated convolution layers as the framework, we have proposed a CNN based crowd counting and density estimation model improved by the attention aware modules with residual connections. To tackle the problem of noise interference, convolutional block attention modules have been introduced into the deep network to segment the foreground and background to focus on interest information, refining deeper features of the input image. To improve information transmission and reuse, residual connections are utilized to link 3 attention blocks. Meanwhile, dilated convolution layers keep larger reception fields and obtain high-resolution density maps. The proposed method has been evaluated on three public benchmarks, i.e. Shanghai Tech A & B, UCF-QNRF and MALL, achieving the mean absolute errors of 64.6 & 8.3, 113.8 and 1.68, respectively. The results outperform some existing excellent approaches. This indicates that the proposed model has high accuracy and better robustness, which is suitable for crowd counting and density estimation in various congested scenes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call