Unsupervised Multi-Object Detection for Video Surveillance Using Memory-Based Recurrent Attention Networks

Zhen He,Hangen He

doi:10.3390/sym10090375

Zhen He, Hangen He

Open Access

https://doi.org/10.3390/sym10090375

Copy DOI

Abstract

Nowadays, video surveillance has become ubiquitous with the quick development of artificial intelligence. Multi-object detection (MOD) is a key step in video surveillance and has been widely studied for a long time. The majority of existing MOD algorithms follow the “divide and conquer” pipeline and utilize popular machine learning techniques to optimize algorithm parameters. However, this pipeline is usually suboptimal since it decomposes the MOD task into several sub-tasks and does not optimize them jointly. In addition, the frequently used supervised learning methods rely on the labeled data which are scarce and expensive to obtain. Thus, we propose an end-to-end Unsupervised Multi-Object Detection framework for video surveillance, where a neural model learns to detect objects from each video frame by minimizing the image reconstruction error. Moreover, we propose a Memory-Based Recurrent Attention Network to ease detection and training. The proposed model was evaluated on both synthetic and real datasets, exhibiting its potential.

Highlights

Video surveillance aims to analyze video data recorded by cameras
Classical methods such as Deformable Part Models (DPMs) [1] follow the “divide and conquer” pipeline that a sliding window approach is first used to generate image regions, a classifier is employed to categorize each region into object/non-object, and post-processing is applied to refine the bounding boxes of object regions. To improve both the efficiency and performance of Multi-object detection (MOD), methods based on Region-based Convolutional Neural Networks (R-CNNs) [3,4,5,6] are proposed and perform well on various popular object detection datasets [7,8,9,10,11]
To quantitatively assess the model, we evaluated different configurations with the commonly used MOD metrics, including the Average Precision (AP) [9], Multi-Object Detection Accuracy (MODA), Multi-Object Detection Precision (MODP) [28], average False Alarm number per Frame (FAF), total True Positive number (TF), total False Positive number (FP), total False Negative number (FN), Precision (TP/(TP + FP)), and Recall (TP/(TP + FN))

Summary

Introduction

Video surveillance aims to analyze video data recorded by cameras. It has been widely used in crime prevention, industrial processes, traffic monitoring, sporting events, etc. Multi-object detection (MOD) from visual data has been extensively studied for many years by computer vision communities Classical methods such as Deformable Part Models (DPMs) [1] follow the “divide and conquer” pipeline that a sliding window approach is first used to generate image regions, a classifier (e.g., a Support Vector Machine [2]) is employed to categorize each region into object/non-object, and post-processing is applied to refine the bounding boxes of object regions (e.g., removing outliers, merging duplicates, and rectifying boundaries). We assess the proposed model on both the synthetic dataset (Sprites) and the real dataset (DukeMTMC [17]), exhibiting its advantages and practicality

Unsupervised Multi-Object Detection

Image Encoder

Recurrent Object Detector

Renderer

Memory-Based Recurrent Attention Networks

Experiments

Sprites

DukeMTMC

Visualizing the UMOD-MRAN

Related Work

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Symmetry	Publication Date: Sep 1, 2018
Citations: 8	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Unsupervised Multi-Object Detection for Video Surveillance Using Memory-Based Recurrent Attention Networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Symmetry

Lead the way for us

Similar Papers

Modified block-matching algorithm for moving object tracking in video surveillance
Shridevi Sukhadeo Vasekar ... Sanjeevani K Shah
International Journal of Modeling, Simulation, and Scientific Computing | VOL. 14
Shridevi Sukhadeo Vasekar, et. al.Shridevi Sukhadeo Vasekar ... Sanjeevani K Shah
24 Sep 2022
International Journal of Modeling, Simulation, and Scientific Computing | VOL. 14

BoxFlow: Unsupervised Face Detector Adaptation from Images to Videos
Jianshu Li ... Jiashi Feng
-
Jianshu Li, et. al.Jianshu Li ... Jiashi Feng
01 May 2017
01 May 2017

Reliable Object Recognition Using Deep Transfer Learning for Marine Transportation Systems With Underwater Surveillance
Mohammad Kazem Moghimi ... Farahnaz Mohanna
IEEE Transactions on Intelligent Transportation Systems | VOL. -
Mohammad Kazem Moghimi, et. al.Mohammad Kazem Moghimi ... Farahnaz Mohanna
01 Jan 2021
IEEE Transactions on Intelligent Transportation Systems | VOL. -

Image convolution techniques integrated with YOLOv3 algorithm in motion object data filtering and detection.
Mai Cheng ... Mengyuan Liu
Scientific Reports | VOL. 14
Mai Cheng, et. al.Mai Cheng ... Mengyuan Liu
01 Apr 2024
Scientific Reports | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Unsupervised Multi-Object Detection for Video Surveillance Using Memory-Based Recurrent Attention Networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Symmetry