Abstract

Recently, Siamese trackers have shown excellent performance in both accuracy and speed. However, traditional trackers have poor robustness against similar objects due to the use of single deep features and the limitation of cosine windows. In this paper, a novel Siamese network combining information fusion with rectangular window filtering named SiamFF is introduced. First, a multilevel fusion network is proposed. At feature-level, the shallow and deep features of the network are fused through a layer-hopping connection to obtain complementary feature maps. Then, the score maps generated by the complementary feature maps are further fused at the score-level to improve the robustness. In addition, based on the continuity and stationarity of objects movement in reality, a score map filtering strategy is proposed. The relative displacement of the target can be predicted by obtaining the interframe information, and the moving direction is applied to filter the score map to further eliminate the analog interference. Experimental results on OTB2015 and VOT2016 benchmarks indicate that SiamFF performs favorably against many state-of-the-art trackers in terms of accuracy while maintaining real-time tracking speed.

Highlights

  • Target tracking is one of the topical issues in the field of computer vision

  • convolutional neural networks (CNN) can be applied to extract deep features to improve tracking accuracy, but online updating greatly reduces the speed of trackers as networks become deeper

  • We introduced a multilevel fusion network, first, the feature-level fusion is performed where the shallow and deep features are fused to obtain complementary feature maps

Read more

Summary

Introduction

Target tracking is one of the topical issues in the field of computer vision. After the first frame of the video is initialized, the target is surrounded by a bounding-box generated by the tracker in subsequent frames [1]. Overcoming deformation, occlusion and movement of the target during the tracking process makes visual tracking challenging [2]–[4]. Correlation filters have demonstrated excellent tracking performance, they utilize the characteristics of Fourier transform and cyclic matrices to train the networks, and update the parameters while tracking [5]. CNN can be applied to extract deep features to improve tracking accuracy, but online updating greatly reduces the speed of trackers as networks become deeper. Under the CNN framework, Siamese trackers have demonstrated their

Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.