Abstract
Existing Siamese network based trackers are easily disturbed by large deformation, occlusion and distractor objects in the background. By comparing these trackers, we observe that the monotonous positive pairs usually have limited challenging factors (Occlusion, Deformation, etc.), which may make the learned features less robust. In addition, the foreground information of the substantial training data is utilized directly without deeper exploration. Thus, the trackers cannot effectively discriminate the foreground from the semantic backgrounds. In this paper, we focus on modifying the Siamese tracker by enriching the positive pairs and taking further advantage of the foreground information. During the offline training phase, a simple sampling strategy is adopted to enrich the challenging factors in positive pairs, which can effectively enhance the robustness of the tracker. At the same time, we highlight the foreground information by padding the background, and the information is utilized to generate a novel padding loss, which guides the tracker to pay less attention to the distractors in the background. Moreover, an improved feature information fusion is adopted to update the template, so that the tracker can adapt to the drastic appearance changes. Comprehensive experiments on the OTB and the VOT benchmarks demonstrate that our proposed tracker can achieve outstanding performance in both accuracy and robustness.
Highlights
Visual tracking is one of the most important directions in the field of computer vision
During the offline training phase, a simple sampling strategy is adopted to enrich the challenging factors in positive pairs, which can effectively enhance the robustness of the tracker
We adopt an improved adaptive feature information fusion to update the template, so that the tracker can adapt to the drastic appearance changes
Summary
Visual tracking is one of the most important directions in the field of computer vision. Siamese network based trackers [11]–[17] have drawn great attention in the tracking field owing to their balanced speed and accuracy. By defining the visual tracking as a matching problem, Siamese trackers aim to learn a general similarity function offline from substantial training videos. Among these trackers, the SiamFC tracker [11] first utilizes the fully-convolutional structure to achieve the
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.