Deep-Learning-Based Precision Visual Tracking

Xiaoming Peng,Yuxing Wei,Yufan Peng,Zhiyong Xu,Jianlin Zhang,Xiang Ji,Haorui Zuo

doi:10.1142/s0218001423520080

Abstract

In this paper, we target the “precision visual tracking” problem, which is the precise tracking of a target point on a moving object. Compared with the abundant efforts in object-level tracking, which aims at accurately predicting the bounding box (or the mask) of the object, much less attention has been drawn to precision visual tracking. To this end, we present a template tracking framework that consists of two parts, template matching and template updating. For template matching, we trained a deep architecture to directly estimate the projective transformation that deforms the template to the search image. For template updating, we came up with a systematic strategy to update the initial and new templates. To avoid drift build-ups, we incorporate a fast-running dense correspondence matching module into the template update step. The proposed method was extensively tested on both synthetic and real data. To generate the synthetic data, we created a dataset of 480 image sequences. Each image sequence comes with the trajectory ground truth of a target point moving across all the frames. We compared the proposed method with six comparative approaches on this dataset. The tracking accuracy achieved by the proposed method was as follows: around 44% of the tracking errors were less than one pixel, about 78% of them were less than three pixels, and only about 14% of them exceeded five pixels. The proposed method is fast, running at 14 frames-per-second (fps) on [Formula: see text] image sequences on a workstation equipped with a Nvidia RTX 2080Ti graphic card. Qualitative results also show that the proposed method is applicable to real-world image sequences. The related code, pre-trained models, and the test data will be made publicly available at https://github.com/XM-Peng/Precision-Visual-Tracking/ .

Full Text