Abstract

Context information, which plays a crucial role in many computer vision tasks, can benefit deep networks to construct better cognitive competence from more comprehensive surrounding. However, most existing crowd counting methods often overlook the importance of context information extracted from both global and local views. To address or alleviate this problem, this paper proposes a novel Context Attention Fusion Network, which is abbreviated as CAFNet for crowd counting. The core idea behind CAFNet is the interaction of multiple context information, including local context, cross-level context, and cross-layer context. To explore local context, we design a local context aggregation module to extract hierarchically local semantic information and then integrate them adaptively. To utilize cross-level context, a guidance attention fusion module is designed to fuse low-level feature map as the guidance of high-level context information so that the spatial details can be effectively compensated. To make full use of cross-layer features, a multi-layer context fusion module is developed to exchange the potentiality of multi-layer information to generate a high-resolution density map. Experimental results on four challenging datasets manifest that the newly proposed CAFNet can deliver impressive results compared with other state-of-the-art crowd counting models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.