Abstract

Long-term visual tracking is one of the most challenging problems in computer vision and is closer to practical application needs. In long-term video sequences, tracking targets often undergo dramatic appearance changes over time due to various factors such as scale variation, illumination change, occlusions and so on. In this work, we propose a novel robust long-term tracking framework based on continual learning and dynamic sample set modules. We transform the online tracking process into a continual learning process of the target model, and continuously learn various appearance changes to adapt to different scenarios. The continual learning module distills the beneficial knowledge of the old network to the new network through warm-up and joint training to achieve a comprehensive and holistic memory of the target appearance. Combining the dynamic sample set can effectively balance the short-term memory and long-term memory of the model, and establish a near-complete target appearance description in the long-term dimension to cope with various challenging situations. Experimental results on the large-scale long-term benchmark datasets LaSOT and UAV20L show that the proposed method performs favourably against other state-of-the-art trackers.

Highlights

  • Visual object tracking is a fundamental problem in computer vision and video processing

  • Our method achieves the best performance in all 12 different attributes, including Aspect Ration Change (ARC) (63.4%), Background Clutter (BC) (62.5%), CM (68.6%), Fast Motion (FM) (63.2%), Full Occlusion (FOC) (55.3%), IV (73.2%), Low Resolution (LR) (66.7%), OV (60.1%), POC (67.8%), SOB (72.3%), Scale Variation (SV) (68.6%) and Viewpoint Change (VC) (60.1%)

  • Our method has a significant improvement in ARC, FM, IV, LR, SOB and SV attributes compared with the tracking algorithm ranking second in each attribute, which are increased by 12.6%, 15.1%, 18.4%, 14.2%, 15.8%, 10.6%, respectively

Read more

Summary

INTRODUCTION

Visual object tracking is a fundamental problem in computer vision and video processing. The above methods update the classifiers with the training samples collected on-the-fly, cannot maintain the long-term memory of the target appearance, may even lead to the catastrophic forgetting dilemma. This means that the model is over-fitting to the current target appearance change, and the complete target appearance model with a holistic view cannot be established in the longterm dimension. Combined with the training sample sets with long-term and short-term memory that is dynamically constructed during the tracking process, the proposed method can continuously learn the various appearance changes of the target and maintain the near-complete memory of the target appearance, establish a more comprehensive target appearance model. The main contributions of this work are as follows: (1) For long-term visual tracking, a continual learning long-term tracking framework is proposed to build a comprehensive target appearance model online. (2) A dynamic sample set construction method with long-term and shortterm memory is designed. (3) Extensive evaluation on several long-term benchmark datasets validates that the proposed tracker can improve the overall performance with a large margin

RELATED WORK
DYNAMIC SAMPLE SET CONSTRUCTION
ONLINE TRACKING PROCESS
EXPERIMENTS
Background
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call