Abstract

Object tracking aims to estimate the position of a given object in subsequent video sequences. One of the research focuses in tracking is feature fusion as the similar response maps generated by feature fusion can significantly affect tracking accuracy. However, traditional naive correlation and depthwise correlation blur the spatial information and do not perform well in low resolution, similar objects, partial occlusion and other scenes. In this paper, we propose a progressive attention tracker called PRAT. It performs sufficient similarity learning between the template and search region to achieve more accurate object tracking. Specifically, PRAT performs self-enhancement on template features, and uses unidirectional cross enhancement and progressive enhancement to fuse template features into search features. Therefore, the search region features have the ability of target perception. In addition, we also design a convolution-based network to replace the FFN in the original Transformer to enhance local semantics. Experiments on six challenging benchmarks show that our PRAT achieves state-of-the-art performance. Particularly, on the challenging UAV123, PRAT sets a new record with 0.703 SUC score. PRAT runs at 63 fps on GPU.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call