Abstract

RGBT tracking is a challenging task that requires robust fusion of visible images (RGB) and thermal infrared images (TIR) to handle various scenarios, such as illumination changes, occlusions, and camouflage. The current popular RGBT trackers mostly based on two stream Siamese trackers. They tend to separately extracts template and search images region features, which neglects relationship between target and background. Moreover, they do not fuse RGB and TIR features properly, limiting their ability to utilize complementary features. To address these issues, we introduce a highly compact adaptive transformer-based network that unifies the process of feature extracting and correlation between template and search images. In this way, the compact network has dual-branches. They can combine feature extraction and correlation for different modalities of template images and search images. Meanwhile, we introduce a cross-modal weight redistribution module (CMWR) for multi-modal fusion. This adaptive fusion scheme learns discriminative features of RGB and TIR data and assigns weights to them, enabling them to complement each other. Furthermore, to address the issue of tracking targets of different scales, we design scale-adaptive optimization pyramid module (SAOP) that adapt to objects of different sizes. Our method achieves exceptional performance on the GTOT, RGBT234 and LasHeR datasets, surpassing most of the existing methods. The results are consistent across multiple datasets, demonstrating the effectiveness and superiority of our approach. And our code is released at: https://github.com/ELOESZHANG/HCANet.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.