Abstract

In this paper, we propose a multi-scale visual tracking algorithm based on attention mechanism to solve the problem that the appearance characteristic model of region proposals network has weak ability to distinguish foreground and semantic background. The method introduces attention mechanism on the basis of region proposals network to realize the self-adaptive salient characteristic expression. The attention mechanism is essentially realized by convolutional neural network. The feature optimization mainly includes spatial attention selection and channel attention selection. Specifically, the spatial attention convolutional neural network is used to learn the planar weights to enhance the foreground and suppress the interference background. The channel attention convolutional neural network is used to learn dimensional weights and discard redundant noisy feature maps to simplify appearance characteristic expression. In addition, spatial and channel attention network respectively deal with high-level and low-level features according to their structural differences to focus on the similarity appearance characteristic and semantic classification characteristic. The experimental results illustrate the outstanding performance compared with several state-of-the-art visual tracking methods on the challenging video sequences.

Highlights

  • Visual tracking is one of the fundamental problems in image processing and computer vision

  • The contributions can be summarized as three folds: 1) We introduce soft attention mechanism on the basis of siamese region proposals network (SiamRPN) to structure an adaptive appearance characteristic model, and improve the ability to distinguish foreground and semantic background

  • The proposed method is compared to the state-of-the-art tracking algorithms on the standard datasets, which are the online tracking benchmark (OTB) [17] and the visual object tracking benchmark (VOT) [32]

Read more

Summary

INTRODUCTION

Visual tracking is one of the fundamental problems in image processing and computer vision. The former learns the planar and dimensional weights by constructing spatial attention network and channel attention network, respectively The latter constructs the anchor-based region proposals network to achieve multi-scale object tracking. The proposed method can significantly improve the ability to distinguish the foreground and background, and prevent the tracking results from quickly deviating from the real target, so as to effectively alleviate drift. The spatial attention network adopts the hourglass-shaped residual network to highlight the foreground and suppress the semantic background It reduces the size of feature maps by convolution and down-sampling to highlight the high-level semantic characteristics corresponding to the global receptive field. The channel attention network learns the dimensional weights to activate the high target-relevant characteristic types and suppress the insignificant characteristic channels, even eliminate the noisy feature maps, so as to obtain the efficient appearance characteristic representation. The whole network can enhance the characteristic difference between the foreground and background to improve the discrimination ability, and significantly reduce the time-consumption

SIAMESE REGION PROPOSALS NETWORK
ATTENTION-BASED MULTI-SCALE VISUAL TRACKING
EXPERIMENT
IMPLEMENTATION DETAILS
Nvalid
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call