Abstract

Offline-trained Siamese networks are not robust to the environmental complication in visual object tracking. Without online learning, the Siamese network cannot learn from instance domain knowledge and adapt to appearance changes of targets. In this paper, a new lightweight Siamese network is proposed for feature extraction. To cope with the dynamics of targets and backgrounds, the weight in the proposed Siamese network is updated in an online manner during the tracking process. In order to enhance the discrimination capability, the cross-entropy loss is integrated into the contrastive loss. Inspired by the face verification algorithm DeepID2, the Bayesian verification model is applied for candidate selection. In general, visual object tracking can benefit from face verification algorithms. Numerical results suggest that the newly developed algorithm achieves comparable performance in public benchmarks.

Highlights

  • IntroductionAs a fundamental and challenging task, visual object tracking has a variety of applications, such as smart video surveillance, autopilot, human–computer interaction and video communication [1,2,3]

  • As a fundamental and challenging task, visual object tracking has a variety of applications, such as smart video surveillance, autopilot, human–computer interaction and video communication [1,2,3].In general, the goal of visual object tracking is to estimate the position and scale variation of targets in the video sequence, where its initial state is given in the first frame

  • To illustrate the characteristics of our proposed algorithm, we compare the Our proposed improved contrastive loss (OSNV) algorithm with nine state-of-the-art tracking methods. According to their working principles, these algorithms could be classified into four classes: (i) Siamese-like tracking algorithms, including SiamFC_3s [7], and SINT_noflow [9]. Both of them train an offline Siamese network to extract feature vectors. (ii) algorithms based on convolutional neural networks (CNNs): MDNet [10], SANet [11]; (iii) algorithms based on correlation filter e.g., ECO [12], KCF [32], MCPF [13]; (iv) algorithms based on hand-crafted features e.g., MEEM [33], TGPR [34]

Read more

Summary

Introduction

As a fundamental and challenging task, visual object tracking has a variety of applications, such as smart video surveillance, autopilot, human–computer interaction and video communication [1,2,3]. To exploit the representation capabilities of CNNs, Tao et al [9] proposed a matching function with the Siamese network to extract feature vectors, which was named the Siamese Instance Search for Tracking (SINT) This new method was trained using the contrastive loss. Different from SiamFC and SINT, the algorithms MDNet and SANet trained an offline model and updated part of it in the inference phase, and these two algorithms were supervised by logistic loss These two algorithms have made superior performance in online tracking benchmark (OTB). It can learn from the domain knowledge of target and adapt to appearance changes of target; An improved contrastive loss integrated with cross-entropy loss is introduced to update the Siamese network; The Bayesian verification model is transferred for candidate selection.

Siamese Network for Visual Object Tracking
Online Algorithms for Visual Object Tracking
Loss Function for CNNs in Visual Tracking
Bayesian Verification Model
Proposed Algorithm
Siamese Network
Cross-Entropy Loss
Contrastive Loss
Improved Contrastive Loss
Implementation of the Bayesian Verification Model
Implementation Details
Experimental Validations
Ablation Study
Evaluation on OTB-2013
Evaluation on OTB-2015
Evaluation on OTB-50
Evaluation on VOT-2016
Evaluation on TempleColor
Qualitative Evaluation
Failure Case
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.