Abstract
Currently, most thermal infrared (TIR) trackers rely on feature matching between the search image and a fixed template cropped from the first frame. Some Siam-based TIR trackers with a template update mechanism introduce historical prediction information in the temporal dimension through correlation filters. However, their feature characterization capability is inadequate to resist target scale variations, appearance changes, and occlusion. To address this challenge, we explore a novel spatio-temporal fusion Transformer (STFT) model to realize robust TIR object tracking. Our approach involves a Transformer-based encoder–decoder that fuses spatio-temporal information. Specifically, we design a dynamic template update strategy based on salient points feature(SPF) representation, which allows the model to leverage the most powerful spatio-temporal information by retrieving multiple salient points on the target image. To further fortify the dynamic template update strategy, we propose an IoU-Aware target state estimation head that utilizes the joint representation of target classification and localization. An IoU-Aware criterion is developed for quality estimation of the dynamic template. The proposed STFT-Net approach has been put to the evaluation on three challenging benchmarks, with extensive experimental results showcasing its superior performance in contrast to acclaimed tracking algorithms. The code is available at https://github.com/qinxin-wh/STFT-Net.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.