Adaptive Multi-Feature Fusion Visual Target Tracking Based on Siamese Neural Network with Cross-Attention Mechanism

Qian Zhou,Hongzheng Yan,Shidong Chen,Ming Yang,Haoran Xia

doi:10.1109/ccgrid54584.2022.00040

Abstract

We present an adaptive multi-feature fusion visual object tracking algorithm based on Siamese neural network with cross-attention mechanism, SiamAtten for short, which can effectively deal with large appearance changes, complex back-ground and interference. The proposed network consists of two parts. One is the full convolution feature extraction network based on cross-attention mechanism and the other is the region proposal generation network based on adaptive multi-feature fusion. The cross-attention mechanism is used in the first part to improve the response ability of feature extraction of the target. The adaptive feature fusion method is used in the second part to infer the target location in a step-by-step process and to get the robust region proposal by regression. Meanwhile, network parameters are reduced by using the depth-wise separable convolution, and the cross-attention mechanism is proposed in this paper can effectively enhance the target identification ability and elevate the robustness. Extensive experiments are carried out on three benchmark datasets and much advanced tracking results are obtained.

Full Text