Disentangling classification and regression in Siamese‐based network for visual tracking

Xiaowei Zhang,Hong Liu,Luming Li,Yuanyuan Gao,Peng Yang

doi:10.1002/cpe.7246

Abstract

SummarySiamese‐based trackers have made great progress in visual tracking community, however, the shared structure of network between classification and regression tasks limits the ability of the trackers to obtain more robust classification prediction and more accurate regression prediction. In this paper, we propose an effective visual tracking framework (named Siamese Disentangled Tracking‐Head, SiamDTH), which disentangles classification and regression in Siamese‐based network for visual tracking from two aspects: feature decoupling and differentiated tracking‐head. First of all, we gather the features of receptive fields with different scales and ratios, and decouple the correlation features through two different styles of feature fusion mode for classification and regression respectively. Moreover, we design the differentiated tracking‐head structure in the sibling head for discriminately handling the parallel classification and regression tasks on visual tracking. Extensive experiments on visual tracking benchmarks including VOT2018, VOT2019 and OTB100 demonstrate that our proposed SiamDTH achieves state‐of‐the‐art performance with a considerable real‐time speed. Our source code is available at:https://github.com/xl0312/SiamDTH.

Full Text