Abstract

Visual tracking is a fundamental vision task that tries to figure out instances of several object classes from videos and images. It has attracted much attention for providing the basic semantic information for numerous applications. Over the past 10 years, visual tracking has made a great progress, but huge challenges still exist in many real-world applications. The facade of a target can be transformed significantly by pose changing, occlusion, and sudden movement, which possibly leads to a sudden target loss. This paper builds a hybrid tracker combining the deep feature method and correlation filter to solve this challenge, and verifies its powerful characteristics. Specifically, an effective visual tracking method is proposed to address the problem of low tracking accuracy due to the limitations of traditional artificial feature models, then rich hiearchical features of Convolutional Neural Networks are used to make the multi-layer features fusion improve the tracker learning accuracy. Finally, a large number of experiments are conducted on benchmark data sets OBT-100 and OBT-50, and show that our proposed algorithm is effective.

Highlights

  • With the development of artificial intelligence and computer vision system, visual tracking has become more and more important in the field of computer vision applications

  • Positive training samples and negative training samples are collected near the estimated target position by some existing deep-learning-based trackers [10,11,12,13], and the classifier is gradually trained by extracting features from CNN

  • Multiple Experts using Entropy Minimization (MEEM) trackers performed well in shape-shifting, rotation, and occlusion sequences, but failed when background clutter and fast motion occurred, because the quantized color channel feature was less effective at processing clutter backgrounds

Read more

Summary

Introduction

With the development of artificial intelligence and computer vision system, visual tracking has become more and more important in the field of computer vision applications. Positive training samples and negative training samples are collected near the estimated target position by some existing deep-learning-based trackers [10,11,12,13], and the classifier is gradually trained by extracting features from CNN All of this raises two questions, first of all, most target recognition algorithms use neural networks as online classifiers, and only use the output of the last layer as targets, which is conducive to express the semantics of the targets. A large number of diverse samples are needed to train a robust neural network classifier, yet it is not easy to implement in real-time visual tracking system To solve these questions, we apply adaptive correlation filtering to the features which are extracted by each layer of CNN, and use it to co-locate and infer the target position. We have worked in this paper along the following three lines

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.