Abstract

RGBT tracking is receiving more and more attention because of its huge tracking potential in an all-weather environment. RGB and thermal source data contain different levels of information about the object. Utilizing the complementary advantage of different levels of information can effectively improve the tracking performance. Existing work focuses on the extraction and fusion of multi-modal features. Although these methods effectively deploy the fusion of information among multiple modalities, they ignore the potential value of multi-level shared clues in different modalities. In addition, these works cannot provide effective candidate boxes after tracking drift, resulting in limited tracker performance. In this paper, we propose a cross-modality interaction and re-identification network that performs multi-level modality-shared, modality-specific and object probability prediction learning. We designed two feature extraction sub-networks, namely, a multi-level modality-shared fusion network and modality complementary sub-network. Specifically, the two sub-networks extract and fuse multi-level modality shared information and modality specific information, respectively. To optimize tracking drift, object-aware branches that predict the object-centered state are designed. Our object-aware branching is simple, neat and efficient. Moreover, to achieve the visual tracking real-time requirement, we designed the object regression branch that does not require repeated region suggestion input. By extensive experiments and comparisons with state-of-the-art trackers on the RGBT tracking benchmark dataset, our tracker achieves leading performance and essentially real-time tracking speeds. Tracking drift caused by occlusion, fast motion and camera moving is significantly optimized.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call