Abstract

Many trackers use attention mechanisms to enhance the details of feature maps. However, most attention mechanisms are designed based on RGB images and thus cannot be effectively adapted to infrared images. The features of infrared images are weak, and the attention mechanism is difficult to learn. Most thermal infrared trackers based on Siamese networks use traditional cross-correlation techniques, which ignore the correlation between local parts. To address these problems, this paper proposes a Siamese multigroup spatial shift (SiamMSS) network for thermal infrared tracking. The SiamMSS network uses a spatial shift model to enhance the details of feature maps. First, the feature map is divided into four groups according to the channel, moving unit wise in four directions of the two dimensions of height and width. Next, the sample and search image features are cross-correlated using the graph attention module cross-correlation method. Finally, split attention is used to fuse multiple response maps. Results of experiments on challenging benchmarks, including VOT-TIR2015, PTB-TIR, and LSOTB-TIR, demonstrate that the proposed SiamMSS outperforms state-of-the-art trackers. The code is available at lvlanbing/SiamMSS (github.com).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call