Abstract

Understanding where people are located and how they are moving about in an environment is critical for operators of large public spaces such as shopping centers, and large public infrastructures such as airports. Automated analysis of CCTV footage is increasingly being used to address this need through techniques that can count crowd sizes, estimate their density, and estimate the through-put of people into and/or out of a choke-point. A limitation of using CCTV based approaches, however, is the need to train models specific to each view which, for large environments with 100s or 1000s of cameras, can quickly become problematic. While there is some success in developing scene-invariant crowd counting and crowd density estimation approaches, much less attention has been given to developing scene-invariant solutions for through-put estimation. In this paper, we investigate the use of convolutional neural network and long short-term memory architectures to estimate pedestrian through-put from arbitrary CCTV viewpoints. To properly develop and demonstrate our approach, we present a new 22 view database featuring 44 h of pedestrian throughput annotation, containing over 11 000 annotated people; and using this proposed approach we show that we are able to outperform a scene-dependant approach across a diverse set of challenging view-points.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call