Abstract

Visual object tracking is a fundamental problem in computer vision, and has been greatly improved with the rapid development of deep learning. However, existing tracking methods with single model update strategy cannot guarantee the robustness of tracker in complex scenes. In this paper, we innovatively propose a novel real-time long-short-term multi-model based tracking method. For the fusion of long-term and short-term features contain more spatiotemporal information, three models with different update periods are designed to learn the long-term and short-term features to improve the tracking robustness. Besides, the hierarchical feature contain deep convolution features and handcraft features are used to represent the current object, which can further improve the tracking accuracy with richer semantic information. Finally, to solve the inaccurate prediction of object position due to the cosine window in the correlation filter, the bounding-box regression strategy is introduced to optimize the final object position. Extensive experiments on OTB, VOT, TC128, and UAV123 datasets demonstrate that the proposed method performs favorably against state-of-the-art algorithms while running at 24 fps.

Highlights

  • Visual object tracking has become an important topic in computer vision in recent years, it can be applied to video surveillance, traffic monitoring, and many other applications, thereby, attracting increasing attention by researchers

  • Inspired by using multi-filter for object tracking to improve the performance based on Kernel correlation filter (KCF), the proposed method uses four correlation filters to estimate the position of object by searching the maximum response value in the correlation map

  • CF2 only uses deep features and KCF only uses histogram of oriented gradient (HOG) features, our results show that deep, HOG, and color names (CN) features combined have better representative capability

Read more

Summary

INTRODUCTION

Visual object tracking has become an important topic in computer vision in recent years, it can be applied to video surveillance, traffic monitoring, and many other applications, thereby, attracting increasing attention by researchers. Correlation filter-based methods uses ground truth in the first frame to train the correlation filter They consider candidate area features to obtain the corresponding values with a correlation filter and the area with maximum responds value is chosen as the predicted position. KCF that uses HOG cannot fully describe the object characteristics compared with when using deep features Due to this problem, combining fast online learning of the correlation filter (CF) [12], [13] with the discriminative power of deep features has become popular. We use RCNN [14] with ground truth in the first frame to train the regression model and obtain the final prediction location, which is more compact around the ground truth

RELATED WORK
CORRELATION FILTERS
BOUNDING-BOX REGRESSION
LONG-SHORT-TERM UPDATE STRATEGY
EXPERIMENTS
EVALUATION ON UAV123
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.