Abstract

Crowd counting is a considerable yet challenging task in intelligent video surveillance and urban security systems. The performance has been significantly boosted along with the springing up of the convolutional neural networks (CNNs). However, accurate and efficient crowd counting in congested scenes remains under-explored due to scale variation and cluttered background. To address these problems, we propose a biologically inspired crowd counting method named group-split attention network (GSANet). The GSANet consists of three principal modules, namely GS module, dual-aware attention module, and aggregation module. The GS module processes the subfeatures of each group in parallel, and groups the input feature map to reduce the computational cost. The dual-aware attention module synergies the spatial and channel dimensional information to alleviate the estimation error in background regions. The aggregation module adopts a learning-based cross-group strategy to aggregate and facilitate the fusion of feature maps along different channel dimensions. Extensive experimental results on five benchmark crowd datasets demonstrate that the GSANet achieves superior performances in terms of accuracy and efficiency.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call