Abstract

Siamese networks have obtained widespread attention in the field of visual tracking. In this paper, we propose a high-performance model based on a deep Siamese network (SiamFC-R22) for real-time visual tracking. In response to the problem that most existing Siamese trackers cannot take advantage of the more abundant feature representation provided by deep networks, we construct a deep backbone network architecture with reasonable receptive field and stride by stacking redesigned residual modules. Furthermore, we propose a multi-layer aggregation module (MLA) to fuse a series of features effectively of different layers. MLA consists of the RAC branch and the IL branch. RAC is used to boost the ability to learn the representation of high-level semantic features. IL is applied to capture the better expression of low-level features that contain more detailed information. The comprehensive experiments on the OTB2015 benchmark illustrate that our proposed SiamFC-R22 achieves an AUC of 0.667. Meanwhile, it runs at over 60 frames per second, exceeding state-of-the-art competitors with significant advantages.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.