Temporal Context Enhanced Feature Aggregation for Video Object Detection

Fei He,Senyao Du,Xin Zhao,Kaiqi Huang,Naiyu Gao,Qiaozhe Li

doi:10.1609/aaai.v34i07.6727

Abstract

Video object detection is a challenging task because of the presence of appearance deterioration in certain video frames. One typical solution is to aggregate neighboring features to enhance per-frame appearance features. However, such a method ignores the temporal relations between the aggregated frames, which is critical for improving video recognition accuracy. To handle the appearance deterioration problem, this paper proposes a temporal context enhanced network (TCENet) to exploit temporal context information by temporal aggregation for video object detection. To handle the displacement of the objects in videos, a novel DeformAlign module is proposed to align the spatial features from frame to frame. Instead of adopting a fixed-length window fusion strategy, a temporal stride predictor is proposed to adaptively select video frames for aggregation, which facilitates exploiting variable temporal information and requiring fewer video frames for aggregation to achieve better results. Our TCENet achieves state-of-the-art performance on the ImageNet VID dataset and has a faster runtime. Without bells-and-whistles, our TCENet achieves 80.3% mAP by only aggregating 3 frames.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Temporal Context Enhanced Feature Aggregation for Video Object Detection

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Apr 3, 2020
Citations: 24

Similar Papers

Modeling Long- and Short-Term Temporal Context for Video Object Detection
Chen Zhang ... Joohee Kim
-
Chen Zhang, et. al.Chen Zhang ... Joohee Kim
01 Sep 2019
01 Sep 2019

Visual Feature Learning on Video Object and Human Action Detection: A Systematic Review.
Dengshan Li ... Rujing Wang
Micromachines | VOL. 13
Dengshan Li, et. al.Dengshan Li ... Rujing Wang
31 Dec 2021
Micromachines | VOL. 13

A feature temporal attention based interleaved network for fast video object detection
Yanni Yang ... Qin Shi
Journal of Ambient Intelligence and Humanized Computing | VOL. 14
Yanni Yang, et. al.Yanni Yang ... Qin Shi
11 May 2021
Journal of Ambient Intelligence and Humanized Computing | VOL. 14

When Few-Shot Learning Meets Video Object Detection
Zhongjie Yu ... Lin Chen
-
Zhongjie Yu, et. al.Zhongjie Yu ... Lin Chen
21 Aug 2022
21 Aug 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Temporal Context Enhanced Feature Aggregation for Video Object Detection

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence