Abstract

Object detection is widely used in object tracking; anchor-free object tracking provides an end-to-end single-object-tracking approach. In this study, we propose a new anchor-free network, the Siamese center-prediction network (SiamCPN). Given the presence of referenced object features in the initial frame, we directly predict the center point and size of the object in subsequent frames in a Siamese-structure network without the need for perframe post-processing operations. Unlike other anchor-free tracking approaches that are based on semantic segmentation and achieve anchor-free tracking by pixel-level prediction, SiamCPN directly obtains all information required for tracking, greatly simplifying the model. A center-prediction sub-network is applied to multiple stages of the backbone to adaptively learn from the experience of different branches of the Siamese net. The model can accurately predict object location, implement appropriate corrections, and regress the size of the target bounding box. Compared to other leading Siamese networks, SiamCPN is simpler, faster, and more efficient as it uses fewer hyperparameters. Experiments demonstrate that our method outperforms other leading Siamese networks on GOT-10K and UAV123 benchmarks, and is comparable to other excellent trackers on LaSOT, VOT2016, and OTB-100 while improving inference speed 1.5 to 2 times.

Highlights

  • Single-object tracking is a fundamental problem in visual media processing

  • Our proposed Siamese center-prediction network (SiamCPN) can be treated as an encoding– decoding framework

  • By ensuring feature extraction and correlation calculation in center-prediction sub-network (CPN), the differences of the two input frames can be encoded into the response maps

Read more

Summary

Introduction

Single-object tracking is a fundamental problem in visual media processing. It is widely used in applications requiring location and appearance characteristics (shape, color, etc.) of targets, such as interactive visual media editing, intelligent monitoring, human–computer interaction, augmented reality, etc. SiamRPN++[6], DaSiamRPN [10], and SiamDW [11] improved tracking performance via the backbone network structure, residual block structure, sampling strategy, and in other ways. All of these approaches had relied on a predefined configuration of anchors. After obtaining the prediction results of multiple adjacent pixels in the target area and upsampling the response map, the prediction results of multiple adjacent points are weighted and averaged to give the final target box This post-processing procedure increases the computational burden during tracking. A few channels of the response maps are learned to directly predict the center and size of the target region, achieving anchor-free tracking. A CPN to adaptively correlate multistage outputs from the backbone. A demonstration that SiamCPN has superior performance on multiple datasets and is competitive in terms of inferencing speed to other methods selected in this work

Related work
Overview
Siamese center prediction network
Center prediction sub-network
Self-adapted block
Depth-wise correlation
Objective
Implementation
Training
Testing
Comparison with state-of-the-art
Assessment using GOT-10K
Assessment using LaSOT
Assessment using VOT2016
Assessment using OTB-100
Assessment using UAV123
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call