Abstract

Existing deep trackers mainly use deep neural networks pre-trained on the object recognition training sets to generate deep features as target representation. However, pre-trained deep features are not effective in representing arbitrary forms of target objects which are likely to be unseen for the pre-trained deep networks. To narrow the gap of representation capability, we propose to transfer the objectness information within pre-trained deep networks. The transferred objectness information is utilized to generate deep features aware of any arbitrary form of target objects for robust visual tracking. Specifically, we design a novel network branch on top of pre-trained deep models to perform incremental transfer learning. The learned network with the transferred objectness information helps to locate target objects undergoing large appearance changes precisely. Experimental results on standard benchmark datasets demonstrate that the proposed algorithm performs favorably against the start-of-the-art trackers.

Highlights

  • Visual tracking plays an important role in advancing numerous vision applications [7], [8], [16]

  • We propose to transfer the objectness information within pre-trained deep networks to narrow the gap between the representation capability of pre-trained deep features in recognition and that in tracking

  • We show the effectiveness of our Objectness Transfer Network (OTN) which transfers objectness information to facilitate tracking

Read more

Summary

INTRODUCTION

Visual tracking plays an important role in advancing numerous vision applications [7], [8], [16]. Existing methods often learn decision-making modules online (e.g., Discriminative Correlation Filters (DCFs) [9] and Support Vector Machine (SVM) [21]) These decisionmaking modules mine discriminative information in pretrained deep features for adaptation to the target objects with arbitrary forms. Once the representation capability of pre-trained deep features is weakened due to arbitrary forms of the target objects, the decision-making modules cannot be trained well. The deep feature from the pre-trained deep model is updated by the feature from the learnable network for minimizing the proposed objectness loss This strategy shifts the feature subspace pertaining to the target object in the current video into the feature subspace pertaining to the objects in recognition training sets. Our method shows favorable performance against the state-of-the-art trackers

RELATED WORK
OBJECTIVE FUNCTION We define the objectness loss as
VISUALIZATION
2) TRACKING RESULTS
OFFLINE TRAINING
INITIALIZATION
ONLINE DETECTION
EXPERIMENTAL VALIDATIONS
CONCLUSION AND FUTURE WORK
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.