Attention-Guided Disentangled Feature Aggregation for Video Object Detection.

Shishir Muralidhara,Didier Stricker,Muhammad Zeshan Afzal,Marcus Liwicki,Khurram Azeem Hashmi,Alain Pagani

doi:10.3390/s22218583

Abstract

Object detection is a computer vision task that involves localisation and classification of objects in an image. Video data implicitly introduces several challenges, such as blur, occlusion and defocus, making video object detection more challenging in comparison to still image object detection, which is performed on individual and independent images. This paper tackles these challenges by proposing an attention-heavy framework for video object detection that aggregates the disentangled features extracted from individual frames. The proposed framework is a two-stage object detector based on the Faster R-CNN architecture. The disentanglement head integrates scale, spatial and task-aware attention and applies it to the features extracted by the backbone network across all the frames. Subsequently, the aggregation head incorporates temporal attention and improves detection in the target frame by aggregating the features of the support frames. These include the features extracted from the disentanglement network along with the temporal features. We evaluate the proposed framework using the ImageNet VID dataset and achieve a mean Average Precision (mAP) of 49.8 and 52.5 using the backbones of ResNet-50 and ResNet-101, respectively. The improvement in performance over the individual baseline methods validates the efficacy of the proposed approach.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Sensors (Basel, Switzerland)	Publication Date: Nov 7, 2022
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Attention-Guided Disentangled Feature Aggregation for Video Object Detection.

Abstract

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)

Lead the way for us

Similar Papers

Object Detection in Video with Spatiotemporal Sampling Networks
Gedas Bertasius ... Jianbo Shi
-
Gedas Bertasius, et. al.Gedas Bertasius ... Jianbo Shi
01 Jan 2018
01 Jan 2018

PTSEFormer: Progressive Temporal-Spatial Enhanced TransFormer Towards Video Object Detection
Han Wang ... Shanyan Guan
-
Han Wang, et. al.Han Wang ... Shanyan Guan
01 Jan 2021
01 Jan 2021

Object Detection in Videos with Tubelet Proposal Networks
Kai Kang ... Hongsheng Li
-
Kai Kang, et. al.Kai Kang ... Hongsheng Li
01 Jul 2017
01 Jul 2017

Feature Aligned Recurrent Network for Causal Video Object Detection
Yifei Wang ... Ming Zhang
-
Yifei Wang, et. al.Yifei Wang ... Ming Zhang
01 Sep 2019
01 Sep 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Attention-Guided Disentangled Feature Aggregation for Video Object Detection.

Abstract

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)