Abstract

Fully convolutional Siamese network (SiamFC) has demonstrated high performance in the visual tracking field, but the learned CNN features are redundant and not discriminative to separate the object from the background. To address the above problem, this paper proposes a dual attention module that is integrated into the Siamese network to select the features both in the spatial and channel domains. Especially, a non-local attention module is followed by the last layer of the network, and this benefit to obtain the self-attention feature map of the target from the spatial dimension. On the other hand, a channel attention module is proposed to adjust the importance of different channels’ features according to the corresponding responses generated by each channel feature and the target. Additionally, the GOT10k dataset is employed to train our dual attention Siamese network (SiamDA) to improve the target representation ability, which enhances the discrimination of the model. Experimental results show that the proposed algorithm improves the accuracy by 7.6% and the success rate by 5.6% compared with the baseline tracker.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.