AMTrack:Transformer tracking via action information and mix-frequency features

Sugang Ma,Chuang Chen,Licheng Zhang,Xiaobao Yang,Jinyu Zhang,Xiangmo Zhao

doi:10.1016/j.eswa.2024.125451

Abstract

Nowadays, Transformer-based visual tracking algorithms have been developing quickly because of the self-attention mechanism of Transformer, which has the capability to model global information. Although the self-attention mechanism in Transformer can effectively capture long-range dependencies in feature space, they only use flattened two-dimensional features and are unable to capture long-range temporal dependencies. Furthermore, since the self-attention in Transformer functions as a low-pass filter, it picks up on low-frequency features of the target while ignoring high-frequency features. This research suggests a Transformer tracker based on action information and mix-frequency features (AMTrack) to address these problems. Specifically, to address the lack of temporal remote dependencies, we introduce the target action aware module and the target action offset module. The target action aware module sets up several pathways to extract spatio-temporal, channel, and motion feature independently. In contrast, the target action offset module computes the target’s offset information by computing relative feature maps. Furthermore, in order to address the imbalance between high and low frequency features, we propose a mix-frequency attention and multi-frequency self-attention convolutional block. The mix-frequency attention uses high-frequency features within partitioned local windows as input for the high-frequency branch and average-pooled low-frequency features as the input for the low-frequency branch, calculating attention scores in both branches respectively. The multi-frequency self-attention convolutional block uses self-attention to capture low-frequency features and convolution to capture high-frequency features. Extensive experiments are carried out on eight challenging tracking datasets (e.g., OTB100 (Object Tracking Benchmark 100), NFS (Need For Speed), UAV123 (Unmanned Aerial Vehicles 123), TC128 (Temple Color 128), VOT2018 (Visual Object Tracking 2018), LaSOT (Large-scale Single Object Tracking), TrackingNet (Tracking Network), GOT-10k (Generic Object Tracking-10k)), and the experimental results show that our tracker achieves excellent tracking performance when compared with several state-of-the-art tracking algorithms. The experimental results show that on LaSOT, the success rates AUC (Area Under Curve), PNorm, and P reach 65.8%, 69.2%, and 68.0%, respectively, where the AUC value is 2.1% higher than the baseline algorithm TrDiMP (Transformer Discriminative Model Prediction). On other datasets, our tracker also achieves excellent tracking performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

AMTrack:Transformer tracking via action information and mix-frequency features

Abstract

Talk to us

Similar Papers

More From: Expert Systems With Applications

Lead the way for us

Similar Papers

Comparison of blood gas analysis parameters, biochemical tests and hematological parameters in geriatric patients admitted to the emergency department
Ayşegül Bilge ... Hatice Şeyma Akça
Journal of Clinical Medicine of Kazakhstan | VOL. 20
Ayşegül Bilge, et. al.Ayşegül Bilge ... Hatice Şeyma Akça
21 Apr 2023
Journal of Clinical Medicine of Kazakhstan | VOL. 20

Evaluation of six scoring systems and four laboratory tests in the prognostic assessment of severe acute pancreatitis
...
Chinese Journal of Digestion | VOL. 38
, et. al. ...
15 Oct 2018
Chinese Journal of Digestion | VOL. 38

Predictive value of combination of endoscopic ultrasound and 64-slice dual-source computed tomography in regional clinical staging and peritoneal metastases of gastric cancer
...
Chinese Journal of Digestion | VOL. 38
, et. al. ...
15 Feb 2018
Chinese Journal of Digestion | VOL. 38

A retrospective and prospective study to establish a preoperative difficulty predicting model for video-assisted thoracoscopic lobectomy and mediastinal lymph node dissection
Zixiao Wang ... Daqiang Sun
BMC Surgery | VOL. 22
Zixiao Wang, et. al.Zixiao Wang ... Daqiang Sun
08 Apr 2022
BMC Surgery | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

AMTrack:Transformer tracking via action information and mix-frequency features

Abstract

Talk to us

Similar Papers

More From: Expert Systems With Applications