SiamADT: Siamese Attention and Deformable Features Fusion Network for Visual Object Tracking

Fasheng Wang,Ping Cao,Xing Wang,Bing He,Fuming Sun

doi:10.1007/s11063-023-11290-5

Abstract

To date, existing Siamese-based trackers have achieved excellent performance. However, in some complex scenarios, using deep convolutional layers alone can not effectively capture powerful representative features. To solve this problem, we propose a Siamese Attention and Deformable features fusion network for visual object Tracking (SiamADT). The proposed SiamADT consists of three modules: a Siamese attention network module for attention feature extraction, a deformable features fusion module, and a classification-regression module for bounding box prediction. Our framework uses ResNet-50 as the backbone for anchor-free tracking. Without tricky anchor hyperparameters tuning and manual intervention, SiamADT is more flexible and versatile. We conduct extensive experiments on four challenging benchmark datasets. The results demonstrate that SiamADT achieves competitive performance among state-of-the-art methods, with real-time speed—30 frames per second.

Full Text