Abstract

In this paper, we introduce a novel yet challenging research problem, interactive crowd video generation, committed to producing diverse and continuous crowd video, and relieving the difficulty of insufficient annotated real-world datasets in crowd analysis. Our goal is to recursively generate realistic future crowd video frames given few context frames, under the user-specified guidance, namely individual positions of the crowd. To this end, we propose a deep network architecture specifically designed for crowd video generation that is composed of two complementary modules, each of which combats the problems of crowd dynamic synthesis and appearance preservation respectively. Particularly, a spatio-temporal transfer module is proposed to infer the crowd position and structure from guidance and temporal information, and a point-aware flow prediction module is presented to preserve appearance consistency by flow-based warping. Then, the outputs of the two modules are integrated by a self-selective fusion unit to produce an identity-preserved and continuous video. Unlike previous works, we generate continuous crowd behaviors beyond identity annotations or matching. Extensive experiments show that our method is effective for crowd video generation. More importantly, we demonstrate the generated video can produce diverse crowd behaviors and be used for augmenting different crowd analysis tasks, i.e., crowd counting, anomaly detection, crowd video prediction. Code is available at https://github.com/Icep2020/CrowdGAN.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call