Abstract

AbstractBoth global dependency and local correlation are crucial for solving the scale variation of crowd. However, most of previous methods fail to take two factors into consideration simultaneously. Against the aforementioned issue, a deformable channel non‐local network, abbreviated as DCNLNet for crowd counting, which can simultaneously learn global context information and adaptive local receptive field is proposed. Specifically, the proposed DCNLNet consists of two well‐crafted designed modules: deformable channel non‐local block (DCNL) and spatial attention feature fusion block (SAFF). The DCNL encodes long‐range dependencies between pixels and the adaptive local correlation with channel non‐local and deformable convolution, respectively, benefiting for improving the spatial discrimination of features. While the SAFF aims to aggregate the cross‐level information, which interacts these features from different depths and learns specific weights for the feature maps with spatial attention. Extensive experiments are performed on three crowd counting benchmark datasets and experimental results indicate that the proposed DCNLNet achieves compelling performance compared to other representative counting models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call