Motion Guided Siamese Trackers for Visual Tracking

Chenglong Wu,Yue Zhang,Xian Sun,Yi Zhang,Hongqi Wang,Wenkai Zhang,Yunyan Zhang

doi:10.1109/access.2020.2964269

Chenglong Wu, Yue Zhang + Show 5 more

Open Access

https://doi.org/10.1109/access.2020.2964269

Copy DOI

Abstract

Siamese trackers learn the appearance model of the target in the first frame and then exploit the model to locate the target in the subsequent frames. Meanwhile, the appearance model remains unchanged in the subsequent frames. Due to the powerful feature extraction capability of the deep convolutional neural networks, Siamese trackers achieve advanced performance. However, due to the non-update of the appearance model and the changing appearance of the target, the problem of tracking drift occurs frequently, especially in the background clutters scenarios. In order to tackle this issue, we propose a motion model and a discriminative model. Firstly, the motion model of the target is constructed to determine whether the tracking drift occurs or not since the position of the target predicted by the motion model is smooth in timing but the position of the target predicted by the Siamese tracker may be not smooth. In this case, the temporal information is utilized to supplement the Siamese tracker which only employs the spatial information. Secondly, the discriminative model is learned to determine the final position of the target when the tracking drift happens. Finally, a flexible model update strategy of the discriminative model is presented. In order to demonstrate the generality of the proposed method, we apply it for two famous Siamese trackers, SiamFC and SiamRPN_DW. Extensive experiments on OTB2013, OTB2015, VOT2016, VOT2019 and GOT-10k benchmarks demonstrate that the proposed trackers outperform the baseline trackers and achieve the state-of-the-art performance, especially in the background clutters scenarios. To the best of our knowledge, we are the first time to propose motion guided Siamese trackers. Moreover, We can release our code to encourage more researches in this direction.

Highlights

Visual tracking is a research hotpot in the field of computer vision due to its wide application scenarios, ranging from surveillance, augmented reality, autonomous driving [1], to robotics [2]
We can judge whether the tracking drift happens or not by computing the Intersection Over Union (IOU) of the predicted target’s positions of the motion model and the predicted target’s positions of the Siamese tracker, since the position of the target predicted by the motion model is smooth in timing but the target’s position predicted by the Siamese tracker may be not
If the tracking drift happens, the discriminative model of the target is learned by using the Discriminative Correlation Filter (DCF) which is trained on the ground-truth sample in the first frame

Summary

INTRODUCTION

Visual tracking is a research hotpot in the field of computer vision due to its wide application scenarios, ranging from surveillance, augmented reality, autonomous driving [1], to robotics [2]. If the tracking drift happens, the discriminative model of the target is learned by using the Discriminative Correlation Filter (DCF) which is trained on the ground-truth sample in the first frame. We can determine whether the tracking drift happens or not since the position of the target predicted by the motion model is smooth in timing but the target’s position predicted by the Siamese tracker may be not. In this case, the temporal information is utilized to supplement the Siamese tracker which only employs the spatial information.

RELATED WORK

THE DISCRIMINATIVE MODEL

ABLATION STUDY

Findings

CONCLUSION