Video Object Detection Using Object’s Motion Context and Spatio-Temporal Feature Aggregation

Junho Koh

doi:10.48448/w4jk-ks69

Abstract

The deep learning technique has recently led to significant improvement in object detection accuracy. In many applications, object detection is performed on video data consisting of a sequence of two-dimensional (2D) image frames. Numerous object detection schemes have been designed to detect objects independently in each video frame. Though temporal information within adjacent image frames can be exploited in subsequent object tracking stage, it has been shown that the object detection accuracy can be significantly improved by exploiting the temporal structure in the image sequence in the object detection stage. In this paper, we propose a novel video object detection method that exploits both the motion context inferred from the adjacent frames and the spatio-temporal features aggregated over the image sequence. First, correlation between the spatial feature maps over two adjacent frames are computed and the embedding vector, representing the motion context, is obtained by encoding the N correlation maps using long short term memory (LSTM). In addition to utilizing the motion context, the spatial feature maps for N+1 consecutive frames are aggregated to boost the quality of the feature map. The gated attention network is employed to selectively combine the temporal feature maps based on their relevance to the feature map in the present image frame. While most video object detectors have been developed for two-stage object detectors, our proposed idea applies to one-stage detectors with the advantage of low computational complexity in practical real-time applications. Our numerical evaluation conducted on the ImageNet object detection from video (VID) dataset demonstrates that our proposed network achieves significant performance gain over the baseline algorithms and outperforms the existing one-stage video object detectors.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Video Object Detection Using Object’s Motion Context and Spatio-Temporal Feature Aggregation

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Video Object Detection Using Object's Motion Context and Spatio-Temporal Feature Aggregation
Jaekyum Kim ... Jun Won Choi
-
Jaekyum Kim, et. al.Jaekyum Kim ... Jun Won Choi
10 Jan 2021
10 Jan 2021

Video Object Detection Using Motion Context and Feature Aggregation
Jaekyum Kim ... Jun Won Choi
-
Jaekyum Kim, et. al.Jaekyum Kim ... Jun Won Choi
21 Oct 2020
21 Oct 2020

Visual Feature Learning on Video Object and Human Action Detection: A Systematic Review.
Dengshan Li ... Rujing Wang
Micromachines | VOL. 13
Dengshan Li, et. al.Dengshan Li ... Rujing Wang
31 Dec 2021
Micromachines | VOL. 13

Video Object Detection and Tracking based on Angle Consistency between Motion and Flow
Toshiki Seo ... Hironobu Fujiyoshi
-
Toshiki Seo, et. al.Toshiki Seo ... Hironobu Fujiyoshi
19 Oct 2020
19 Oct 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Video Object Detection Using Object’s Motion Context and Spatio-Temporal Feature Aggregation

Abstract

Talk to us

Similar Papers