Abstract

Discriminative correlation filters (DCFs) have been shown to perform superiorly in visual object tracking. However, visual tracking is still challenging when the target objects undergo complex scenarios such as occlusion, deformation, scale changes and illumination changes. In this paper, we utilize the hierarchical features of convolutional neural networks (CNNs) and learn a spatial-temporal context correlation filter on convolutional layers. Then, the translation is estimated by fusing the response score of the filters on the three convolutional layers. In terms of scale estimation, we learn a discriminative correlation filter to estimate scale from the best confidence results. Furthermore, we proposed a re-detection activation discrimination method to improve the robustness of visual tracking in the case of tracking failure and an adaptive model update method to reduce tracking drift caused by noisy updates. We evaluate the proposed tracker with DCFs and deep features on OTB benchmark datasets. The tracking results demonstrated that the proposed algorithm is superior to several state-of-the-art DCF methods in terms of accuracy and robustness.

Highlights

  • Visual object tracking is a basic task in computer vision, with a wide range of applications such as autonomous driving, robotics, video surveillance, human-machine interaction and so forth [1,2]. the initial frame of the target is given, how to use an effective method to judge the position of the target in the subsequent frame is a difficult problem

  • To solve the issues of model update, we propose a novel update method which is equivalent to active mode

  • The proposed tracker is implemented in MATLAB2014a on a PC with an i7 3.2 GHz CPU with

Read more

Summary

Introduction

The initial frame of the target is given, how to use an effective method to judge the position of the target in the subsequent frame is a difficult problem. These methods should be able to overcome various challenges well, including background clutter, illumination changes, scale variation, motion blur, and partial occlusions. The main goal is to find the area with the highest confidence for classifier, which is the target location, and to use the tracking result as a sample to update classifier.

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call