Abstract

Siamese networks have obtained widespread attention in the field of visual tracking. In this paper, we propose a high-performance model based on a deep Siamese network (SiamFC-R22) for real-time visual tracking. In response to the problem that most existing Siamese trackers cannot take advantage of the more abundant feature representation provided by deep networks, we construct a deep backbone network architecture with reasonable receptive field and stride by stacking redesigned residual modules. Furthermore, we propose a multi-layer aggregation module (MLA) to fuse a series of features effectively of different layers. MLA consists of the RAC branch and the IL branch. RAC is used to boost the ability to learn the representation of high-level semantic features. IL is applied to capture the better expression of low-level features that contain more detailed information. The comprehensive experiments on the OTB2015 benchmark illustrate that our proposed SiamFC-R22 achieves an AUC of 0.667. Meanwhile, it runs at over 60 frames per second, exceeding state-of-the-art competitors with significant advantages.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call