Abstract

• A new target-cognisant siamese-based anchor-free tracker. • The proposed method computes cross-spatial attention for refining the measurement of spatial similarity. • Two tracking mechanisms are used to promote the precision of bounding box prediction. • A max filtering module is proposed to filter out similar distractors. • Our method achieves competitive performance on several tracking datasets. Siamese trackers have become the mainstream framework for visual object tracking in recent years. However, the extraction of the template and search space features is disjoint for a Siamese tracker, resulting in a limited interaction between its classification and regression branches. This degrades the model capacity accurately to estimate the target, especially when it exhibits severe appearance variations. To address this problem, this paper presents a target-cognisant Siamese network for robust visual tracking. First, we introduce a new target-cognisant attention block that computes spatial cross-attention between the template and search branches to convey the relevant appearance information before correlation. Second, we advocate two mechanisms to promote the precision of obtained bounding boxes under complex tracking scenarios. Last, we propose a max filtering module to utilise the guidance of the regression branch to filter out potential interfering predictions in the classification map. The experimental results obtained on challenging benchmarks demonstrate the competitive performance of the proposed method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call