Abstract

Target representations play an important role in performance improvement for Thermal Infrared tracking. To tackle this problem, we propose a Cross-Modal Distillation method to distill representations of the TIR modality from the RGB modality, which conducts on a large amount of unlabeled paired RGB-TIR data in a self-supervised way. Benefiting from the powerful model in the RGB modality, the cross-modal distillation can learn the TIR-specific representation for promoting TIR tracking. The proposed approach can be incorporated into different baseline trackers conveniently as a generic and independent component. In practice, three different approaches are explored to generate paired RGB-TIR patches with the same semantics for training in a self-supervised way. It is easy to extend to an even larger scale of unlabeled training data. Our tracker outperforms the baseline tracker by achieving an absolute gain of 2.3% Success Rate, 2.7% Precision, and 2.5% Norm Precision on published datasets, respectively.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call