Abstract

Recently, Siamese trackers have attracted extensive attention because of their simplicity and low computational cost. However, for most Siamese trackers, only a frame of the video sequence is used as the template, and the template is not updated in inference process, which makes the tracking success rate inferior to the trackers that can update the template online. In the current study, we introduce an enhanced visual attention Siamese network (ESA-Siam). The method is based on a deep convolutional neural network, which integrates channel attention and spatial self-attention to improve the discriminative ability of the tracker for positive and negative samples. Channel attention reflects different targets according to the response value of different channels to achieve better target representation. Spatial self-attention captures the correlation between two arbitrary positions to help locate the target. At the same time, a template search attention module is designed to implicitly update the template features online, which can effectively improve the success rate of the tracker when the target is interfered by the background. The proposed ESA-Siam tracker shows superior performance compared with 18 existing state-of-the-art trackers on five benchmark datasets including OTB50, OTB100, VOT2016, VOT2018, and LaSOT.

Highlights

  • Visual object tracking is a process of identifying the region of interest in the video, which can track the target in a given video

  • Combining the target information of the search branch can help the tracker identify positive and negative samples better. erefore, we design a template search collaborative attention module, called T-SCAttn, which can update the template features online. It can improve the robustness and the positive and negative sample discrimination of the tracker and better deal with the problems of low image resolution and target occlusion. e main contributions of our work are as follows: (1) We introduce a new twin network visual tracking algorithm based on the enhanced visual attention mechanism

  • We propose an enhanced visual attention Siamese network that can update template features online for visual tracking

Read more

Summary

Introduction

Visual object tracking is a process of identifying the region of interest in the video, which can track the target in a given video. Siamese-based trackers train offline based on a large amount of data but do not update the target template online. E visual attention mechanism [24,25,26] can pay attention to the channel and location of interest and screen out the feature information that can represent the tracking target better. Erefore, we design a template search collaborative attention module, called T-SCAttn, which can update the template features online It can improve the robustness and the positive and negative sample discrimination of the tracker and better deal with the problems of low image resolution and target occlusion. (1) We introduce a new twin network visual tracking algorithm based on the enhanced visual attention mechanism (including channel attention, spatial self-attention, and template search collaborative attention). (4) Our approach in the benchmark datasets OTB50 [29], OTB100 [30], VOT2016 [31], VOT2018 [32], and LaSOT [33] has excellent tracking performance, the tracking of which can reach speeds of up to 60 fps

Related Work
Experiments and Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call