Abstract
In recent years, researchers have developed many deep-learning-based methods to count crowd numbers in static images. However, much fewer works focus on video-based crowd counting, in which the critical challenge of temporal correlation has not been well explored. This paper proposes a Spatial-Temporal Graph Network (STGN) to achieve efficient and accurate crowd counting in videos via learning pixel-wise and patch-wise relations in local spatial-temporal domains. Specifically, we design a pyramid graph module to leverage multi-scale features. In each scale, we sequentially construct three graphs: spatial-temporal pixel graph, temporal patch graph, and spatial pixel graph, in which we apply the self-attention mechanism to capture pixel-wise relation, learn structure-aware relation, and aggregate local features, respectively. Furthermore, we propose spatial-aware channel-wise attention to effectively fuse multi-scale features. To demonstrate the effectiveness of the proposed method, we conduct experiments on five crowd counting datasets, including a large-scale video crowd dataset (FDST). Moreover, the proposed model is also applied in the vehicle counting dataset (TRANCOS). The results show that the proposed model outperforms existing spatial-temporal crowd counting models and achieves state-of-the-art. The code is available at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/wuzhe71/STGN</uri>
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Circuits and Systems for Video Technology
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.