Abstract

We study and address the multi-view crowd counting (MVCC) problem which poses more realistic challenges than single-view crowd counting for better facilitating crowd management / public safety systems. Its major challenge lies in how to fully distill and aggregate useful, complementary information among multiple camera views to create powerful ground-plane representations for wide-area crowd analysis. In this paper, we present a graph-based, multi-view learning model called Co-Communication Graph Convolutional Network (CoCo-GCN) to jointly investigate intra-view contextual dependencies and inter-view complementary relations. More specifically, CoCo-GCN builds a view-agnostic graph interaction space for each camera view to conduct efficient contextual reasoning, and extends the intra-view reasoning by using a novel Graph Communication Layer (GCL) to also take between-graph (cross-view), complementary information into account. Moreover, CoCo-GCN uses a new Co-Memory Layer (CoML) to jointly coarsen the graphs and close the ‘representational gap’ among them for further exploiting the compositional nature of graphs and learning more consistent representations. Finally, these jointly learned features of multiple views can be easily fused to create ground-plane representations for wide-area crowd counting. Experiments show that the proposed CoCo-GCN achieves state-of-the-art results on three MVCC datasets, i.e., PETS2009, DukeMTMC, and City Street, significantly improving the scene-level accuracy over previous models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call