Abstract

Most trackers are only dependent on the first frame as a template to search for and locate the target location in subsequent videos. However, objects may undergo occlusions and deformation over time, and the original snapshot of the object can no longer accurately reflect the current appearance of the object, which greatly limits the performance improvement of the tracker. In this paper, we propose a novel Siamese tracking algorithm with symmetric structure called SiamRDT, which reflects the latest appearance and motion states of objects through additional reliable dynamic templates. The model decides whether to update the dynamic template according to the quality estimation score and employs the attention mechanism to enhance the reliability of the dynamic template, adopting the depth-wise correlation algorithm to integrate the initial template and the dynamic template and the search area. Through reliable dynamic templates and credible initial templates, the model can fuse initial-state information and the latest-state information of objects. We conduct sufficient ablation experiments to illustrate the effectiveness of the proposed key components, and the tracker achieves very competitive results on four large-scale tracking benchmarks, namely OTB100, GOT-10k, LaSOT, and TrackingNet. Our tracker achieves an AO score of 61.3 on GOT-10k, a precision score of 56.5 on LaSOT, a precision score of 69.3 on TrackingNet, and a precision score of 90.5 on OTB100.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call