MAMBA: Multi-level Aggregation via Memory Bank for Video Object Detection

Guanxiong Sun,Guosheng Hu,Neil Robertson,Yang Hua

doi:10.1609/aaai.v35i3.16365

Abstract

State-of-the-art video object detection methods maintain a memory structure, either a sliding window or a memory queue, to enhance the current frame using attention mechanisms. However, we argue that these memory structures are not efficient or sufficient because of two implied operations: (1) concatenating all features in memory for enhancement, leading to a heavy computational cost; (2) frame-wise memory updating, preventing the memory from capturing more temporal information. In this paper, we propose a multi-level aggregation architecture via memory bank called MAMBA. Specifically, our memory bank employs two novel operations to eliminate disadvantages of existing methods: (1) light-weight key-set construction which can significantly reduce the computational cost; (2) fine-grained feature-wise updating strategy which enables our method to utilize knowledge from the whole video. To better enhance features from complementary levels, i.e., feature maps and proposals, we further propose a generalized enhancement operation (GEO) to aggregate multi-level features in a unified manner. We conduct extensive evaluations on the challenging ImageNetVID dataset. Compared with existing state-of-the-art methods, our method achieves superior performance in terms of both speed and accuracy. More remarkably, MAMBA achieves mAP of 83.7%/84.6% at 12.6/9.1 FPS with ResNet-101.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

MAMBA: Multi-level Aggregation via Memory Bank for Video Object Detection

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence	Publication Date: May 18, 2021
Citations: 23

Similar Papers

Video Object Detection Using Object’s Motion Context and Spatio-Temporal Feature Aggregation

-

29 Dec 2020
29 Dec 2020

Video Object Detection Using Object's Motion Context and Spatio-Temporal Feature Aggregation
Jaekyum Kim ... Junho Koh
-
Jaekyum Kim, et. al.Jaekyum Kim ... Junho Koh
10 Jan 2021
10 Jan 2021

Construction of customer-aware grid interactive service system based on neural network model
Baowei Zhou ... Sheng Cao
Applied Mathematics and Nonlinear Sciences | VOL. 9
Baowei Zhou, et. al.Baowei Zhou ... Sheng Cao
02 Oct 2023
Applied Mathematics and Nonlinear Sciences | VOL. 9

Driving Fatigue Detection Based on the Combination of Multi-Branch 3D-CNN and Attention Mechanism
Wenbin Xiang ... Feiyang Li
Applied sciences | VOL. 12
Wenbin Xiang, et. al.Wenbin Xiang ... Feiyang Li
06 May 2022
Applied sciences | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MAMBA: Multi-level Aggregation via Memory Bank for Video Object Detection

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence