Abstract

Understanding crowd behavior is a pivotal step toward urban scene analysis. This is considered a very challenging task and has rarely been addressed to date due to the complexity of motion dynamics co-occurring across a given scene, which involves spatial and temporal dependencies. Unlike the mainstream research, which usually treats the temporal and spatial dependencies in a crowd separately, this paper presents a deep end-to-end approach that jointly considers the spatiotemporal information, leading to a rich understanding of crowd behavior. We first extract the displacement information describing crowd motion patterns from tracklets/trajectories. This information is subsequently fed into a convolutional layer to learn the underlying motion patterns and create high-level representations. The derived representations are used as inputs to a long short-term memory-based architecture to learn the underlying spatiotemporal cues in a single operation for the entire crowd in a given scene. We evaluate our approach on a widely used, large-scale benchmark datasets for three critical applications: pedestrian path forecasting, destination estimation, and holistic crowd behavior classification. The results show a drastic improvement compared to recently trending works.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call