Training Data Extraction and Object Detection in Surveillance Scenario.

Artur Wilkowski,Włodzimierz Kasprzak,Maciej Stefańczyk

doi:10.3390/s20092689

Artur Wilkowski, Włodzimierz Kasprzak + Show 1 more

Open Access

https://doi.org/10.3390/s20092689

Copy DOI

Journal: Sensors (Basel, Switzerland)	Publication Date: May 8, 2020
Citations: 8	License type: CC BY 4.0

Affiliation: Warsaw University of Technology

Abstract

Police and various security services use video analysis for securing public space, mass events, and when investigating criminal activity. Due to a huge amount of data supplied to surveillance systems, some automatic data processing is a necessity. In one typical scenario, an operator marks an object in an image frame and searches for all occurrences of the object in other frames or even image sequences. This problem is hard in general. Algorithms supporting this scenario must reconcile several seemingly contradicting factors: training and detection speed, detection reliability, and learning from small data sets. In the system proposed here, we use a two-stage detector. The first region proposal stage is based on a Cascade Classifier while the second classification stage is based either on a Support Vector Machines (SVMs) or Convolutional Neural Networks (CNNs). The proposed configuration ensures both speed and detection reliability. In addition to this, an object tracking and background-foreground separation algorithm is used, supported by the GrabCut algorithm and a sample synthesis procedure, in order to collect rich training data for the detector. Experiments show that the system is effective, useful, and applicable to practical surveillance tasks.

Highlights

Police and various security services use video analysis when investigating criminal activity
The Fragmentation describes the ratio of good Hits to the number of segments in the ground truth data (GT)
We had the information about the total number of times the target appears

Summary

Introduction

Police and various security services use video analysis when investigating criminal activity. The optimal solution would be marking only a single object in a selected image frame and initiating a search to find occurrences of similar objects in other frames of the processed sequence or different sequences. This imposes several constraints on the Machine Vision solution that need to be addressed. The tracking process is fine, it can be observed that the generated mask does not correspond well to ground truth data This effect could be attributed to a natural bias of the method towards specific classes of objects that it was trained on.

Methods

Results

Conclusion