Abstract
RGBT tracking is rapidly developing due to its complementary advantages of RGB and thermal frames. Existing methods with high accuracy track at a lower speed, and do not make full use of the hierarchical information in the feature extraction and the historical information of the sequences. To address these issues, a novel dual-modality space-time memory (DMSTM) network is proposed for robust RGBT tracking. Specifically, DMSTM is divided into three modules. The first module is the dual-modality backbone that utilizes both shallow and deep information by aggregating feature maps of dimensional changes during downsampling. Another module is the space-time memory reader with bimodal fusion. It aggregates features of historical and current frames to share information in the time domain. The last module is the siamese head network, which computes the predicted loss sum of the two modalities and back-propagates it. This avoids degrading the tracking performance due to sequence frame pairs where the training targets are not perfectly aligned. Extensive experiments on three RGBT benchmark datasets show that the performance and efficiency of the proposed DMSTM exceed that of state-of-the-art methods while running at 27.6 FPS.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have