Spatiotemporal key region transformer for visual tracking

Ruixu Wu,Xianbin Wen,Liming Yuan,Haixia Xu

doi:10.1007/s40747-023-01040-4

Abstract

Visual tracking is an important field of computer vision research. Although transformer-based trackers have achieved remarkable performance, the transformer structure is globally computationally inefficient, it does not screen important patches, and it cannot focus on key target regions. At the same time, temporal motion features are easily overlooked. To solve these problems, this paper proposes a new method, SKRT, that removes the CNN structure and directly uses a transformer as the backbone network to extract multiframe video features. Then, these feature maps are mixed and superimposed to obtain spatiotemporal information. To focus on important parts efficiently, we use key region extraction to obtain a small set of template and search feature map patches and reinput them into the transformer as a cross-correlation computation. Finally, we predict the position of a tracking object through center-corner prediction. To demonstrate the effectiveness of our method, we conduct experiments on challenging benchmark datasets (GOT-10K, TrackingNet, VOT2018, OTB100, LaSOT), and the results show that SKRT is competitive with other state-of-the-art methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Complex & Intelligent Systems	Publication Date: Apr 10, 2023
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Spatiotemporal key region transformer for visual tracking

Abstract

Talk to us

Similar Papers

More From: Complex & Intelligent Systems

Lead the way for us

Similar Papers

STIRNet: A Spatial-temporal Interaction-aware Recursive Network for Human Trajectory Prediction
Yusheng Peng ... Xiangyu Li
-
Yusheng Peng, et. al.Yusheng Peng ... Xiangyu Li
01 Oct 2021
01 Oct 2021

Effective fusion of deep multitasking representations for robust visual tracking
Seyed Mojtaba Marvasti-Zadeh ... Shohreh Kasaei
The Visual Computer | VOL. 38
Seyed Mojtaba Marvasti-Zadeh, et. al.Seyed Mojtaba Marvasti-Zadeh ... Shohreh Kasaei
19 Oct 2021
The Visual Computer | VOL. 38

Multi-physical and Temporal Feature Based Self-correcting Approximation Model for Monocular 3D Volleyball Trajectory Analysis
Jiaxu Dong ... Takeshi Ikenaga
-
Jiaxu Dong, et. al.Jiaxu Dong ... Takeshi Ikenaga
25 Jul 2021
25 Jul 2021

Deformable Patch-based NCC Measure for Visual Tracking
Xian-Guo Yu ... Qi-Feng Yu
DEStech Transactions on Computer Science and Engineering | VOL. -
Xian-Guo Yu, et. al.Xian-Guo Yu ... Qi-Feng Yu
25 Jan 2017
DEStech Transactions on Computer Science and Engineering | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Spatiotemporal key region transformer for visual tracking

Abstract

Talk to us

Similar Papers

More From: Complex & Intelligent Systems