Abstract

Recently, object trackers based on Siamese networks have attracted considerable attentions due to their remarkable tracking performance and widespread application. Especially, the anchor-based methods exploit the region proposal subnetwork to get accurate prediction of a target and make great performance improvement. However, those trackers cannot capture the spatial information very well and the pre-defined anchors will hinder robustness. To solve these problems, we propose a Siamese-based anchor-free object tracking algorithm with multiscale spatial attentions in this paper. Firstly, we take ResNet-50 as the backbone network to generate multiscale features of both template patch and search regions. Secondly, we propose the spatial attention extraction (SAE) block to capture the spatial information among all positions in the template and search region feature maps. Thirdly, we put these features into the SAE block to get the multiscale spatial attentions. Finally, an anchor-free classification and regression subnetwork is used for predicting the location of the target. Unlike anchor-based methods, our tracker directly predicts the target position without predefined parameters. Extensive experiments with state-of-the-art trackers are carried out on four challenging visual object tracking benchmarks: OTB100, UAV123, VOT2016 and GOT-10k. Those experimental results confirm the effectiveness of our proposed tracker.

Highlights

  • Object trackers based on Siamese networks have attracted considerable attentions due to their remarkable tracking performance and widespread application

  • The correlation filter-based (CF) trackers train a regressor of a target given in the initial frame of a video, and use this regressor with Fourier transforming to calculate the location of the target in the candidate region

  • By following ­SiamFC19, the template patches with 127 × 127 pixels and the search regions with 255 × 255 pixels are used for both training and testing

Read more

Summary

Introduction

Object trackers based on Siamese networks have attracted considerable attentions due to their remarkable tracking performance and widespread application. The anchor-based methods exploit the region proposal subnetwork to get accurate prediction of a target and make great performance improvement. Those trackers cannot capture the spatial information very well and the pre-defined anchors will hinder robustness. The correlation filter-based (CF) trackers train a regressor of a target given in the initial frame of a video, and use this regressor with Fourier transforming to calculate the location of the target in the candidate region Those CFbased trackers can track the object online, and update the parameters of filters during this process efficiently. The above RPN-based algorithms obtain accurate target bounding boxes by designing multiscale anchor boxes, which seriously affect the robustness and increase the interference of human factors

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.