Abstract

Existing Siamese network based trackers are easily disturbed by large deformation, occlusion and distractor objects in the background. By comparing these trackers, we observe that the monotonous positive pairs usually have limited challenging factors (Occlusion, Deformation, etc.), which may make the learned features less robust. In addition, the foreground information of the substantial training data is utilized directly without deeper exploration. Thus, the trackers cannot effectively discriminate the foreground from the semantic backgrounds. In this paper, we focus on modifying the Siamese tracker by enriching the positive pairs and taking further advantage of the foreground information. During the offline training phase, a simple sampling strategy is adopted to enrich the challenging factors in positive pairs, which can effectively enhance the robustness of the tracker. At the same time, we highlight the foreground information by padding the background, and the information is utilized to generate a novel padding loss, which guides the tracker to pay less attention to the distractors in the background. Moreover, an improved feature information fusion is adopted to update the template, so that the tracker can adapt to the drastic appearance changes. Comprehensive experiments on the OTB and the VOT benchmarks demonstrate that our proposed tracker can achieve outstanding performance in both accuracy and robustness.

Highlights

  • Visual tracking is one of the most important directions in the field of computer vision

  • During the offline training phase, a simple sampling strategy is adopted to enrich the challenging factors in positive pairs, which can effectively enhance the robustness of the tracker

  • We adopt an improved adaptive feature information fusion to update the template, so that the tracker can adapt to the drastic appearance changes

Read more

Summary

Introduction

Visual tracking is one of the most important directions in the field of computer vision. Siamese network based trackers [11]–[17] have drawn great attention in the tracking field owing to their balanced speed and accuracy. By defining the visual tracking as a matching problem, Siamese trackers aim to learn a general similarity function offline from substantial training videos. Among these trackers, the SiamFC tracker [11] first utilizes the fully-convolutional structure to achieve the

Objectives
Methods
Findings
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.