Abstract

The existing crowd counting methods usually adopt attention mechanisms to tackle background noise, or apply multilevel features or multiscale context fusion to tackle scale variation. However, these approaches deal with these two problems separately. In this paper, we propose a hybrid attention network (HAN) by employing progressive embedding scale-context (PES) information, which enables the network to simultaneously suppress noise and adapt head scale variation. We build the hybrid attention mechanism through two parallel spatial attention and channel attention modules, which makes the network focus more on the human head area and reduce the interference of background objects. In addition, we embed certain scale-context to the hybrid attention along the spatial and channel dimensions to alleviate the counting errors caused by the variation of perspective and head scale. Finally, we propose a progressive learning strategy through cascading multiple hybrid attention modules with embedding different scale contexts, which can gradually integrate different scale-context information into the current feature map from global to local. Ablation experiments show that the network architecture can gradually learn multiscale features and suppress background noise. Extensive experiments demonstrate that HANet obtains state-of-the-art counting performance on five mainstream datasets.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call