Context Enhanced Transformer for Single Image Object Detection in Video Data

Seungjun An,Jeongyeol Baek,Byeongwon Lee,Seungryong Kim,Gyeongnyeon Kim,Seonghoon Park

doi:10.1609/aaai.v38i2.27825

Abstract

With the increasing importance of video data in real-world applications, there is a rising need for efficient object detection methods that utilize temporal information. While existing video object detection (VOD) techniques employ various strategies to address this challenge, they typically depend on locally adjacent frames or randomly sampled images within a clip. Although recent Transformer-based VOD methods have shown promising results, their reliance on multiple inputs and additional network complexity to incorporate temporal information limits their practical applicability. In this paper, we propose a novel approach to single image object detection, called Context Enhanced TRansformer (CETR), by incorporating temporal context into DETR using a newly designed memory module. To efficiently store temporal information, we construct a class-wise memory that collects contextual information across data. Additionally, we present a classification-based sampling technique to selectively utilize the relevant memory for the current image. In the testing, We introduce a test-time memory adaptation method that updates individual memory functions by considering the test distribution. Experiments with CityCam and ImageNet VID datasets exhibit the efficiency of the framework on various video systems. The project page and code will be made available at: https://ku-cvlab.github.io/CETR.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Context Enhanced Transformer for Single Image Object Detection in Video Data

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Similar Papers

Object Detection in Videos with Tubelet Proposal Networks
Kai Kang ... Hongsheng Li
-
Kai Kang, et. al.Kai Kang ... Hongsheng Li
01 Jul 2017
01 Jul 2017

Visual Feature Learning on Video Object and Human Action Detection: A Systematic Review.
Dengshan Li ... Rujing Wang
Micromachines | VOL. 13
Dengshan Li, et. al.Dengshan Li ... Rujing Wang
31 Dec 2021
Micromachines | VOL. 13

REAL-TIME OBJECT DETECTION IN VIDEOS USING DEEP LEARNING MODELS
Monika M ... Aniruddha S Rumale
ICTACT Journal on Image and Video Processing | VOL. 14
Monika M, et. al.Monika M ... Aniruddha S Rumale
01 Nov 2023
ICTACT Journal on Image and Video Processing | VOL. 14

Object tracking and detection techniques under GANN threats: A systemic review
Saeed Matar Al Jaberi ... Ahmed N Al-Masri
Applied Soft Computing | VOL. 139
Saeed Matar Al Jaberi, et. al.Saeed Matar Al Jaberi ... Ahmed N Al-Masri
23 Mar 2023
Applied Soft Computing | VOL. 139

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Context Enhanced Transformer for Single Image Object Detection in Video Data

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence