Abstract

The growing rate of public space CCTV installations has generated a need for automated methods for exploiting video surveillance data including scene understanding, query, behaviour annotation and summarization. For this reason, extensive research has been performed on surveillance scene understanding and analysis. However, most studies have considered single scenes, or groups of adjacent scenes. The semantic similarity between different but related scenes (e.g., many different traffic scenes of similar layout) is not generally exploited to improve any automated surveillance tasks and reduce manual effort. Exploiting commonality, and sharing any supervised annotations, between different scenes is however challenging due to: Some scenes are totally un-related -- and thus any information sharing between them would be detrimental; while others may only share a subset of common activities -- and thus information sharing is only useful if it is selective. Moreover, semantically similar activities which should be modelled together and shared across scenes may have quite different pixel-level appearance in each scene. To address these issues we develop a new framework for distributed multiple-scene global understanding that clusters surveillance scenes by their ability to explain each other's behaviours; and further discovers which subset of activities are shared versus scene-specific within each cluster. We show how to use this structured representation of multiple scenes to improve common surveillance tasks including scene activity understanding, cross-scene query-by-example, behaviour classification with reduced supervised labelling requirements, and video summarization. In each case we demonstrate how our multi-scene model improves on a collection of standard single scene models and a flat model of all scenes.

Highlights

  • T HE widespread use of public space CCTV camera systems has generated unprecedented amounts of data which can overwhelm human operators due to the sheer length of the surveillance videos and the large number of surveillance videos captured at different locations concurrently

  • Some of the key tasks addressed by automated surveillance video understanding include: (i) Behaviour profiling / scene understanding to reveal what are the typical activities and behaviours in the surveilled space [1], [2], [3], [4], [5]; (ii) Behaviour query by example, allowing the operator to search for similar occurrences to a specified example behaviour [1]; (iii) Supervised learning to classify/annotate activities or behaviours if events of interest are annotated in a training dataset [2]; (iv) Summarization to

  • All of these tasks have generally been addressed within a single scene, or a group of adjacent scenes

Read more

Summary

INTRODUCTION

T HE widespread use of public space CCTV camera systems has generated unprecedented amounts of data which can overwhelm human operators due to the sheer length of the surveillance videos and the large number of surveillance videos captured at different locations concurrently. Despite the clear potential benefits of exploiting multiscene surveillance, it can not be achieved with existing singlescene models [1], [2], [3], [4], [5] These approaches learn an independent model for each scene and do not discover corresponding activities or behaviours across scenes even if they share the same semantic meaning. Is required: (i) Learning an activity representation that can be shared across scenes; (ii) Model behaviours with the shared representation so they are comparable across scenes and (iii) Generalising surveillance tasks to the multi-scene case, including behaviour profiling/scene understanding, cross-scene query-by-example, cross-scene classification and multi-scene summarization. We define a novel jointly multiscene approach to summarization that exploits the shared representation to compress redundancy both within and across scenes of each cluster

RELATED WORK
LEARNING LOCAL SCENE ACTIVITIES
Video Clip Representation
Learning Local Activities with Topic Model
MULTI-LAYER ACTIVITY AND SCENE CLUSTERING
Scene Level Clustering
Learning A Shared Activity Topic Basis
CROSS-SCENE QUERY BY EXAMPLE AND CLASSIFICATION
MULTI-SCENE SUMMARIZATION
EXPERIMENTS
18 Pedestrian Horizontal
Multi-Layer Scene Clustering
Multi-Scene Summarization
Summarization Method Random
Further Analysis
20 Auto Selected Number of Clusters Fixed Number of Clusters
Findings
VIII. CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call