Abstract
Recent years have seen greater interests in the tracking-by-detection methods in the visual object tracking, because of their excellent tracking performance. But most existing methods fix the scale which makes the trackers unreliable to handle large scale variations in complex scenes. In this paper, we decompose the tracking into target translation and scale prediction. We adopt a scale estimation approach based on the tracking-by-detection framework, develop a new model update scheme, and present a robust correlation tracking algorithm with discriminative correlation filters. The approach works by learning the translation and scale correlation filters. We obtain the target translation and scale by finding the maximum output response of the learned correlation filters and then online update the target models. Extensive experiments results on 12 challenging benchmark sequences show that the proposed tracking approach reduces the average center location error (CLE) by 6.8 pixels, significantly improves the performance by 17.5% in the average success rate (SR) and by 5.4% in the average distance precision (DP) compared to the second best one of the other five excellent existing tracking algorithms, and is robust to appearance variations introduced by scale variations, pose variations, illumination changes, partial occlusion, fast motion, rotation, and background clutter.
Highlights
Visual tracking, as a fundamental step to explore videos, is important in many computer vision based applications, such as face recognition, human behavior analysis, robotics, intelligent surveillance, intelligent transportation systems, and human-computer interaction
Though the research on visual tracking algorithms has lasted for decades, visual tracking is still a problem because of the factors such as pose variation, illumination changes, partial occlusion, fast motion, scale variation, background clutter, and so on
44.6 48.9 14.9 40.2 8.1 scheme approach reduces the average center location error (CLE) by 13.2 pixels and improves the performance by 1.4% in average success rate (SR) and 7.7% in average distance precision (DP) compared to the circulant structure of tracking-by-detection with kernels (CSK)
Summary
As a fundamental step to explore videos, is important in many computer vision based applications, such as face recognition, human behavior analysis, robotics, intelligent surveillance, intelligent transportation systems, and human-computer interaction. Different from generative trackers, discriminative methods [8,9,10,11,12,13] address the tracking problem as a classification problem which differentiates the tracked targets from the backgrounds. Kalal et al [15] propose a P-N learning algorithm to learn tracking classifiers from positive and negative samples These methods are termed as tracking-by-detection [16,17,18], in which a binary classifier separates the target from background in the continuous frames. The targets in the Skating and CarDark sequences undergo background clutter, illumination, and pose changes. For the Tiger sequence as shown in Figure 7(j), the object undergoes abrupt motion, pose variation, and partial occlusion.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.