Multi-level Proposal Relations Aggregation for Video Object Detection

Chongkai Yu,Wenjie Chen,Bing Wu

doi:10.1007/978-3-031-15919-0_61

Abstract

Video information often deteriorates in certain frames, which is a great challenge for object detection. It is difficult to identify the object in this frame by just utilizing the information of one frame. Recently, plenty of studies have shown that context aggregating information through the self-attention mechanism can enhance the features in key frames. However, these methods only exploit some of inter-video and intra-video global-local information, not all of it. Global semantic and local localization information in the same video can assist object classification and regression. The intra-proposal relation among different videos can provide important cues to distinguish confusing objects. All of this information is able to enhance the performance of video object detection. In this paper, we design a Multi-Level Proposal Relations Aggregation network to mine inter-video and intra-video global-local pro-posal relations. For intra-video, we effectively aggregate global and local information to augments the proposal features of key frames. For inter-video, we aggregate the inter-video key frame features to the target video under the constraint of relation regularization. We flexibly utilize the relation module to aggregate the proposals from different frames. Experiments on ImageNet VID dataset demonstrate the effectiveness of our method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multi-level Proposal Relations Aggregation for Video Object Detection

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Memory Enhanced Global-Local Aggregation for Video Object Detection
Yihong Chen ... Han Hu
-
Yihong Chen, et. al.Yihong Chen ... Han Hu
01 Jun 2020
01 Jun 2020

Integration of global and local information in videos for key frame extraction
Dianting Liu ... Chao Chen
-
Dianting Liu, et. al.Dianting Liu ... Chao Chen
01 Aug 2010
01 Aug 2010

The use of deep learning and mean shift to learn global and local processing in human visual perception
Wei-Wen Hsu ... Min Zhang
-
Wei-Wen Hsu, et. al. Wei-Wen Hsu ... Min Zhang
01 Oct 2016
01 Oct 2016

Image Visual Attention Mechanism-based Global and Local Semantic Information Fusion for Multi-modal English Machine Translation
Xiaobin Guo Xiaobin Guo
電腦學刊 | VOL. 33
Xiaobin Guo Xiaobin GuoXiaobin Guo Xiaobin Guo
01 Apr 2022
電腦學刊 | VOL. 33

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-level Proposal Relations Aggregation for Video Object Detection

Abstract

Talk to us

Similar Papers