An Anchor-Free Siamese Network with Multi-Template Update for Object Tracking

Tongtong Yuan,Wenzhu Yang,Qian Li,Yuxia Wang

doi:10.3390/electronics10091067

Abstract

Siamese trackers are widely used in various fields for their advantages of balancing speed and accuracy. Compared with the anchor-based method, the anchor-free-based approach can reach faster speeds without any drop in precision. Inspired by the Siamese network and anchor-free idea, an anchor-free Siamese network (AFSN) with multi-template updates for object tracking is proposed. To improve tracking performance, a dual-fusion method is adopted in which the multi-layer features and multiple prediction results are combined respectively. The low-level feature maps are concatenated with the high-level feature maps to make full use of both spatial and semantic information. To make the results as stable as possible, the final results are obtained by combining multiple prediction results. Aiming at the template update, a high-confidence multi-template update mechanism is used. The average peak to correlation energy is used to determine whether the template should be updated. We use the anchor-free network to implement object tracking in a per-pixel manner, which computes the object category and bounding boxes directly. Experimental results indicate that the average overlap and success rate of the proposed algorithm increase by about 5% and 10%, respectively, compared to the SiamRPN++ algorithm when running on the dataset of GOT-10k (Generic Object Tracking Benchmark).

Highlights

Visual object tracking is a fundamental research direction in computer vision
We tested the proposed tracker on the UAV123 dataset in comparison with several representative trackers, including ATOM [13], RLS-RTMDNet [38], DaSiamRPN [4], SiamRPN [3], ECO [34], SRDCF [30], MEEM [33], and KCF [29]
Our anchor-free Siamese network (AFSN) improves the scores by 9.7%, 12.5%, and 23.4%, respectively, for the three indicators over RLS-RTMDNet

Summary

Introduction

Visual object tracking is a fundamental research direction in computer vision. It is widely used in diverse fields such like visual surveillance, vehicle tracking, and human–computer interaction [1]. Visual object tracking is a fundamental research direction in computer vision. It is widely used in diverse fields such like visual surveillance, vehicle tracking, and human–. Rapid progress has been made in visual object tracking. It is still a great challenge in realworld applications, as objects under unconstrained recording conditions often suffer from illumination variation, heavy occlusion, background clutters, and scale deformation, to name a few [1]. To study different levers of feature map effectiveness, we performed ablation experiments on OTB100. Multi-layer feature maps can improve tracking performance effectively. A model using one single feature map achieves a performance of

Methods

Results

Conclusion