Abstract

Siamese trackers have achieved a good balance between accuracy and efficiency in generic object tracking. However, background distractors cause side effects to the discriminative representation of the target. To suppress the sensitivity of trackers to background distractors, we propose a Double Branch Attention (DBA) block and a Siamese tracker equipped with the DBA block named DBA-Siam. First, the DBA block concatenates channels of multiple layers from two branches of the Siamese framework to obtain rich feature representation. Second, the channel attention is applied to the two concatenated feature blocks to enhance the robust features selectively, thus enhancing the ability to distinguish the target from the complex background. Finally, the DBA block collects the contextual relevance between the Siamese branches and adaptively encodes it into the feature weight of the detection branch for information compensation. Ablation experiments show that the proposed block can enhance the discriminative representation of the target and significantly improve the tracking performance. Results on two popular benchmarks show that DBA-Siam performs favorably against its counterparts. Compared with the advanced algorithm CSTNet, DBA-Siam improves the EAO by 18.9% on VOT2016.

Highlights

  • Generic object tracking is a fundamental task in the field of computer vision, with a wide range of application needs in the fields of monitoring, automatic driving [1,2], surgical detection [3], posture recognition [4], and industrial measurement [5]

  • We use the pre-training parameters provided by SiamRPN to initialize Double Branch Attention (DBA)-Siam and freeze the first three backbone networks to train the parameters of the target branch of the DBA block

  • 4c,c.1 contains two purple areas, which shows that the tracker without the block we designed the DBA block for adaptive feature fusion

Read more

Summary

Introduction

Generic object tracking is a fundamental task in the field of computer vision, with a wide range of application needs in the fields of monitoring, automatic driving [1,2], surgical detection [3], posture recognition [4], and industrial measurement [5]. Many excellent achievements emerged in the visual object tracking task, but this task remains challenging due to the impact of external factors, such as target deformation, environmental illumination, and background disturbance. These adverse factors damage the feature representation of objects, making it difficult for trackers to distinguish the target from the background distractors, resulting in misdetection. The Siamese framework proposed by Betty [9] in 2016 has achieved a good balance between accuracy and efficiency It uses the same set of network parameters to extract the deep features of a given target and search inputs, locates the target by calculating the cross-correlation similarity between the two. Some background distractors may produce features similar to the given target, leading the tracker to judge them as targets

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call