Local Attention Sequence Model for Video Object Detection

Zhenhui Li,Yong Nie,Haibo Wang,Xiaoping Zhuang,Jianzhong Tang

doi:10.3390/app11104561

Zhenhui Li, Yong Nie + Show 3 more

Open Access

https://doi.org/10.3390/app11104561

Copy DOI

Abstract

Video object detection still faces several difficulties and challenges. For example, the imbalance of positive and negative samples leads to low information processing efficiency, and detection performance declines in abnormal situations in video. This paper examines video object detection based on local attention to address such challenges. We propose a local attention sequence model and optimized the parameter and calculation of ConvGRU. It could process spatial and temporal information in videos more efficiently and ultimately improve detection performance under abnormal conditions. The experiments on ImageNet VID show that our method could improve the detection accuracy by 5.3%, and the visualization results show that the method is adaptive to different abnormal conditions, thereby improving the reliability of video object detection.

Highlights

Object detection is a fundamental problem in computer vision and has been widely used in the fields of surveillance, robots, medical intelligence, etc
Instead of relying on optical flow, we propose an innovative video object detection model based on local attention
It can be seen from the comparison of the results that the video object detection model after the introduction of the local attention sequence model can better solve the difficult detection problems caused by the occlusion of the object movement process in the video, posture transformation, and the blurring caused by camera movement

Summary

Introduction

Object detection is a fundamental problem in computer vision and has been widely used in the fields of surveillance, robots, medical intelligence, etc. Redmon proposed the YOLO [5] detection framework in 2015, which rasterizes images and predicts the object category and bounding box for each grid at the same time Applying such image-based object detectors to the domain of videos, is often unsatisfactory due to the deteriorated appearance caused by issues such as motion blur, out-of-focus camera, and rare poses frequently encountered in videos. Existing methods that leverage temporal information for object detection from videos usually use optical flow to propagate high-level features across frames. Our contributions are as follows: We introduce a novel video object detector based on local attention to establish the spatial and temporal correspondence across frames without extra optical flow models.

Video Object Detection

Self-Attention

Spatial Attention

Results

Ablation Study

Conclusion

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: May 17, 2021
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Local Attention Sequence Model for Video Object Detection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Object Detection in Videos with Tubelet Proposal Networks
Hongsheng Li ... Kai Kang
-
Hongsheng Li, et. al.Hongsheng Li ... Kai Kang
01 Jul 2017
01 Jul 2017

Visual Feature Learning on Video Object and Human Action Detection: A Systematic Review.
Chengjun Xie ... Xiufang Jia
Micromachines | VOL. 13
Chengjun Xie, et. al.Chengjun Xie ... Xiufang Jia
31 Dec 2021
Micromachines | VOL. 13

Dualfeat: Dual Feature Aggregation for Video Object Detection
Jing Pan ... Hanzi Wang
-
Jing Pan, et. al.Jing Pan ... Hanzi Wang
16 Oct 2022
16 Oct 2022

PTSEFormer: Progressive Temporal-Spatial Enhanced TransFormer Towards Video Object Detection
Rong Xie ... Han Wang
-
Rong Xie, et. al.Rong Xie ... Han Wang
01 Jan 2021
01 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Local Attention Sequence Model for Video Object Detection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences