Abstract

Context information, which plays a crucial role in many computer vision tasks, can benefit deep networks to construct better cognitive competence from more comprehensive surrounding. However, most existing crowd counting methods often overlook the importance of context information extracted from both global and local views. To address or alleviate this problem, this paper proposes a novel Context Attention Fusion Network, which is abbreviated as CAFNet for crowd counting. The core idea behind CAFNet is the interaction of multiple context information, including local context, cross-level context, and cross-layer context. To explore local context, we design a local context aggregation module to extract hierarchically local semantic information and then integrate them adaptively. To utilize cross-level context, a guidance attention fusion module is designed to fuse low-level feature map as the guidance of high-level context information so that the spatial details can be effectively compensated. To make full use of cross-layer features, a multi-layer context fusion module is developed to exchange the potentiality of multi-layer information to generate a high-resolution density map. Experimental results on four challenging datasets manifest that the newly proposed CAFNet can deliver impressive results compared with other state-of-the-art crowd counting models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call