Abstract

Fully convolutional Siamese network (SiamFC) has demonstrated high performance in the visual tracking field, but the learned CNN features are redundant and not discriminative to separate the object from the background. To address the above problem, this paper proposes a dual attention module that is integrated into the Siamese network to select the features both in the spatial and channel domains. Especially, a non-local attention module is followed by the last layer of the network, and this benefit to obtain the self-attention feature map of the target from the spatial dimension. On the other hand, a channel attention module is proposed to adjust the importance of different channels’ features according to the corresponding responses generated by each channel feature and the target. Additionally, the GOT10k dataset is employed to train our dual attention Siamese network (SiamDA) to improve the target representation ability, which enhances the discrimination of the model. Experimental results show that the proposed algorithm improves the accuracy by 7.6% and the success rate by 5.6% compared with the baseline tracker.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call