Abstract

Existing anchor-based Siamese trackers rely on the anchor’s design to predict the scale and aspect ratio of the target. However, these methods introduce many hyperparameters, leading to computational redundancy. In this paper, to achieve outstanding network efficiency, we propose a ConvNext-based anchor-free Siamese tracking network (CAFSN), which employs an anchor-free design to increase network flexibility and versatility. In CAFSN, to obtain an appropriate backbone network, the state-of-the-art ConvNext network is applied to the visual tracking field for the first time by improving the network stride and receptive field. Moreover, A central confidence branch based on Euclidean distance is offered to suppress low-quality prediction frames in the classification prediction network of CAFSN for robust visual tracking. In particular, we discuss that the Siamese network cannot establish a complete identification model for the tracking target and similar objects, which negatively impacts network performance. We build a Fusion network consisting of crop and 3Dmaxpooling to better distinguish the targets and similar objects’ abilities. This module uses 3DMaxpooling to select the highest activation value to improve the difference between it and other similar objects. Crop unifies the dimensions of different features and reduces the amount of computation. Ablation experiments demonstrate that this module increased success rates by 1.7% and precision by 0.5%. We evaluate CAFSN on challenging benchmarks such as OTB100, UAV123, and GOT-10K, validating advanced performance in noise immunity and similar target identification with 58.44 FPS in real time.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call