Abstract

In essence, visual tracking is a matching problem without any prior information about a class-agnostic object. By leveraging large scale off-line training data, recent trackers based on Siamese networks usually expect to pre-learn underlying similarity functions before a tracking task even begins. Consequently, they lack discriminative and adaptive powers. To address the issues, we propose a multi-stage co-inference tracker (named MSCI) via a multi-task Siamese network, in which a complicated tracking task is divided into three complementary sub-tasks (i.e., classification, regression and detection). Firstly, we design a novel multi-task loss function to end-to-end train the multi-task Siamese network via jointly learning from three sub-tasks. The multi-task Siamese network contains three parallel yet collaborative output layers, which correspond to three key components of our tracker (i.e., classifier, regressor and residual learning based detector). By sharing representations within the components, we not only improve each component’s generalization performance, but also enhance our tracker’s discriminative power. Then, we design a co-inference approach to effectively fuse the complementary components. As a result, our tracker can avoid the pitfalls of purely single components and get reliably observations to improve its adaptive power. Comprehensive experiments on OTB2013, OTB2015 and VOT2016 validate the effectiveness and robustness of our MSCI tracker.

Highlights

  • Visual tracking is a fundamental task in computer vision, with a variety of applications including security surveillance, diverseness navigation, and robotics, etc

  • To address the above problems, we propose a multi-stage co-inference hybrid tracker via a multi-task Siamese network

  • (2) We propose a multi-task Siamese network for visual tracking, in which a novel loss function is designed to jointly train three components via an end-to-end strategy

Read more

Summary

INTRODUCTION

Visual tracking is a fundamental task in computer vision, with a variety of applications including security surveillance, diverseness navigation, and robotics, etc. To capture large appearance changes and improve the adaptivity of our tracker, the residual learning based detector is jointly trained with the classifier and regressor via an end-to-end strategy. Based on the feedbacks from the residual learning based detector, the classifier and regressor may improve their tracking results by adjusting the target templates By enforcing such a multi-stage co-inference approach, we boost our tracker’s discriminative power, and greatly improve its adaptive power. (2) We propose a multi-task Siamese network for visual tracking, in which a novel loss function is designed to jointly train three components (i.e., classifier, regressor and residual learning based detector) via an end-to-end strategy. The residual learning based detector effectively captures large appearance changes, and alleviates rapid model degradation caused by large appearance changes

RELATED WORK
TRAINING DATA COLLECTION FOR MULTI-TASK SIAMESE NETWORK
THE RESIDUAL LAERNING BASED DETECTOR
EXPERIMENTS
IMPLEMENTATION DETAILS Experimental platform and computer configuration
EVALUATION ON OTB2013 AND OTB2015 Quantitative Evaluation
ABLATION STUDY Self-comparison
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.