Abstract
Since video data contains temporal information, video crowd counting demonstrates more potential than single-frame crowd counting for scenarios requiring high accuracy. However, learning robust relationships among frames efficiently and cheaply is very challenging. Existing methods for video crowd counting lack explicit temporal correlation modeling and robustness, and they are complex. In this paper, we propose the Frame-Recurrent Video Crowd Counting (FRVCC) framework to solve these issues. Specifically, we design a frame-recurrent manner to recursively relate the density maps in the temporal dimension, which efficiently explores long-term inter-frame knowledge and ensures the continuity of feature map responses. FRVCC consists of three plug-in modules: an optical flow estimation module, a single-frame counting module, and a density map fusion module. For the fusion module, we propose the ResTrans network to robustly learn complementary features between visual-based and correlation-based feature maps through residual strategy and vision transformer. To constrain the output distribution to be consistent with the ground truth distribution, we introduce an adversarial loss to rectify the training process. Additionally, we release a large-scale synthetic video crowd-counting dataset, CrowdXV, to evaluate the proposed method and further improve its performance. We have conducted extensive experiments on several video-counting datasets. The results demonstrate that FRVCC achieves state-of-the-art performance and, concurrently, high generalization, high flexibility, and less complexity.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Circuits and Systems for Video Technology
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.