Spatio-temporal matching for siamese visual tracking

Jinpu Zhang,Kaiheng Dai,Ziwen Li,Ruonan Wei,Yuehuan Wang

doi:10.1016/j.neucom.2022.11.093

Jinpu Zhang, Kaiheng Dai + Show 3 more

Open Access

PDF Available

https://doi.org/10.1016/j.neucom.2022.11.093

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Siamese trackers formulate the visual tracking task as a similarity matching problem through cross correlation. It is arduous for such methods to track targets with the presence of distractors. We suspect the reasons are twofold: 1) The irrelevant activated channels in the correlation map will produce ambiguous matching results. 2) The pipeline is a per-frame matching process and cannot handle the response aberrance caused by temporal context variation. In this paper, we propose a spatio-temporal matching process to thoroughly explore the capability of 4-D matching in space (height, width and channel) and time. In spatial matching, we introduce a space-variant instance-aware correlation (SI-Corr) to implement different channel-wise response recalibration for each matching position. SI-Corr can guide the generation of instance-aware features and distinguish the target and distractors at the instance level. In temporal matching, we design an aberrance repressed module (ARM) to investigate the short-term positional relationship between the target and distractors. ARM utilizes a simple optimization method to restrict the abrupt alteration of the interframe response maps, which allows the network to learn a temporal consistency of context structure distribution. Moreover, we efficiently embed temporal consistency into the inference process. Experiments on six benchmarks, including OTB100, VOT2018, VOT2020, GOT-10k, LaSOT and TrackingNet, demonstrate the state-of-the-art performance of the proposed method.

Full Text