Abstract

Most existing Siamese trackers mainly use a pre-trained convolutional neural network to extract target features. However, due to the weak discrimination of the target and background information of pre-trained depth features, the performance of the Siamese tracker can be significantly degraded when facing similar targets or changes in target appearance. This paper proposes a multi-channel-aware and adaptive hierarchical deep features module to enhance the discriminative ability of the tracker. Firstly, through the multi-channel-aware deep features module, the importance values of feature channels are obtained from both the target details and overall information, to identify more important feature channels. Secondly, by introducing the adaptive hierarchical deep features module, the importance of each feature layer can be determined according to the response value of each frame, so that the hierarchical features can be integrated to represent the target, which can better adapt to changes in the appearance of the target. Finally, the proposed two modules are integrated into the Siamese framework for target tracking. The Siamese network used in this paper is a two-input branch symmetric neural network with two input branches, and they share the same weights, which are widely used in the field of target tracking. Experiments on some Benchmarks show that the proposed Siamese tracker has several points of improvement compared to the baseline tracker.

Highlights

  • Object tracking is a basic research hotspot in the field of computer vision, and has many applications in daily life, such as autonomous driving [1], video surveillance [2], and human–computer interaction [3]

  • The OTB dataset is a public dataset to test the effectiveness of target-tracking algorithms, which is divided into OTB50 [24] and OTB100 [25], containing 50 and 100 video sequences, respectively

  • Compared to the attention memory tracker MemDTC, our tracker lagged behind in success rate and precision rate by 0.7% and 0.5%, which we speculate is due to the dynamic memory network introduced by MenDTC, which enables the target template to adapt to changes in target appearance during tracking

Read more

Summary

Introduction

Object tracking is a basic research hotspot in the field of computer vision, and has many applications in daily life, such as autonomous driving [1], video surveillance [2], and human–computer interaction [3]. The information of the tracked object is given in the first frame, and the new position of the target in the subsequent frames is predicted by the designed tracker. Since only the first frame of the target information is given, prior knowledge is seriously insufficient. A number of models were proposed to extract target features in target tracking, such as manual-features [4], correlation-filters [5,6,7], regressors [8,9], and classifiers [10,11,12]. While most Siamese-based trackers use pre-trained deep models to extract features for the tracking task, they pay less attention to how learning more discriminative deep features

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call