Abstract

Aiming at the tracking failure due to the disappearance of the target in the long-term target tracking process, this paper proposes a long-term target tracking network based on the visual transformer and template update. First of all, we construct a feature extraction network based on the transformer and adopt a knowledge distillation strategy to improve the effectiveness of the network for global feature extraction. Secondly, in the modeling transformer, the target features are fully fused with the search area features by using encoder, and the position information in the target query is learned by the decoder. Then, target predictions are performed on the information from the encoder–decoder to obtain tracking results. Meanwhile, we design a score head model to judge the validity of the dynamic template of the current frame before tracking in the next frame. We select the appropriate dynamic template for the tracking of the next frame according to the score result. In this paper, we performed extensive experiments on LaSOT, VOT2021-LT, TrackingNet, TLP, and UAV123 datasets, and the experimental results prove the effectiveness of our method. In particular, it exceeds STARK by 0.8 % (F score) on VOT2021-LT, 1.0 % (S score) on LaSOT, and TrackingNet exceed STARK by 1.1 % (NP score), which also demonstrates the superiority of the method in this paper.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.