SiamRAAN: Siamese Residual Attentional Aggregation Network for Visual Object Tracking

Zhiyi Xin,Junyang Yu,Xin He,Yalin Song,Han Li

doi:10.1007/s11063-024-11556-6

Abstract

The Siamese network-based tracker calculates object templates and search images independently, and the template features are not updated online when performing object tracking. Adapting to interference scenarios with performance-guaranteed tracking accuracy when background clutter, illumination variation or partial occlusion occurs in the search area is a challenging task. To effectively address the issue with the abovementioned interference and to improve location accuracy, this paper devises a Siamese residual attentional aggregation network framework for self-adaptive feature implicit updating. First, SiamRAAN introduces Self-RAAN into the backbone network by applying residual self-attention to extract effective objective features. Then, we introduce Cross-RAAN to update the template features online by focusing on the high-relevance parts in the feature extraction process of both the object template and search image. Finally, a multilevel feature fusion module is introduced to fuse the RAAN-enhanced feature information and improve the network’s ability to perceive key features. Extensive experiments conducted on benchmark datasets (GOT-10K, LaSOT, OTB-50, OTB-100 and UAV123) demonstrated that our SiamRAAN delivers excellent performance and runs at 51 FPS in various challenging object tracking tasks. Code is available at https://github.com/MallowYi/SiamRAAN.

Full Text