Siamese-based trackers have been widely used in object tracking. However, aerial remote tracking suffers from various challenges such as scale variation, viewpoint change, background clutter and occlusion, while most existing Siamese trackers are limited to single-scale and local features, making it difficult to achieve accurate aerial tracking. We propose the global multi-scale optimization and prediction head attentional Siamese network to solve this problem and improve aerial tracking performance. Firstly, a transformer-based multi-scale and global feature encoder (TMGFE) is proposed to obtain global multi-scale optimization of features. Then, the prediction head attentional module (PHAM) is proposed to add context information to the prediction head by adaptively adjusting the spatial position and channel contribution of the response map. Benefiting from these two components, the proposed tracker solves these challenges of aerial remote sensing tracking to some extent and improves tracking performance. Additionally, we conduct ablation experiments on aerial tracking benchmarks, including UAV123, UAV20L, UAV123@10fps and DTB70, to verify the effectiveness of the proposed network. The comparisons of our tracker with several state-of-the-art (SOTA) trackers are also conducted on four benchmarks to verify its superior performance. It runs at 40.8 fps on the GPU RTX3060ti.
Read full abstract