Abstract

The rapidly growing demands on real-world crowd security and commercial applications have drawn widespread attentions to crowd counting, a computer vision task that aims to count all persons that appear in a given image. Recent state-of-the-art crowd counting methods commonly follow the density map regression paradigm, where a density map is estimated from the given image and summed up as the total count. Despite achieving impressive progress, these methods are still significantly challenged by complicated scenarios with severe scale variations of persons and cluttered backgrounds. Considering that localization-based counting methods, though less accurate, are able to learn more discriminative representation of persons through locating their positions, we propose a novel Localization Guided Transformer (LGT) framework in this work. The LGT aims to use the knowledge learned from a leading localization-based method to more accurately guide the estimation on density maps for crowd counting. Specifically, our framework first exploits a point-based model with two output heads, i.e., regression head and classification head, to simultaneously predict the head point proposals and point confidence respectively. Then, an intermediate multi-scale feature map is extracted from the shared backbone network and actively fused with the point location information. Afterwards, the fused features are fed into a Transformer module to explore patch-wise interactions via the self-attention mechanism, yielding a more discriminative representation for high-quality density map estimation. Extensive experiments and comparisons with state-of-the-art methods show the effectiveness of our proposed framework.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.