Abstract

Surveillance videos which record crowd behaviors have dramatically increased due to the wide applications. A quick view of such crowd surveillance video in a constrained time is an increasing demand because it always contain a huge number of redundancy frames. In this paper, we focus on summarization of crowd surveillance videos. But it is not easy due to two reasons. First, how to make the decision to keep or discard a subshot from the input surveillance video stream so that the summary can outline the main behaviors of the crowd over a limited frames sequence. Second, how to maintain performance of summarization model for long surveillance videos. To tackle these challenges, we formulate surveillance video summarization as a sequential decision-making process and train the summarization network with reinforcement learning-based framework. A novel crowd location-density reward is proposed to teach summarization network to produce high-quality summaries. In addition, a summarization network with three layers LSTM is designed to maintain performance across longer time spans. Extensive experiments on three public crowd surveillance videos datasets show that the proposed method achieves state-of-the-art performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call