Deep Siamese Cross-Residual Learning for Robust Visual Tracking

Fan Wu,Jihui Wang,Jie Guo,Chang Xu,Xiangmin Li,Tingfa Xu,Bo Huang

doi:10.1109/jiot.2020.3041052

Abstract

The sixth-generation (6G) wireless technology contributes to the establishment of the Internet of Things (IoT). Recently, the IoT has become popular because of its smart architectures and various applications. Among these applications, intelligent urban surveillance systems for smart cities are becoming more and more important. Therefore, designing a robust visual tracking method has become an urgent task. Deep Siamese convolutional neural networks have been applied to visual tracking recently because of their advantageous abilities to learn a matching function between the template and the target candidate. Unlike traditional Siamese networks, which separately treat the two branches, we propose deep Siamese cross-residual learning to entangle the two branches from the beginning to the end of the Siamese network. This strategy can make the two branches exchange instance-specific information at different nodes of the network and learn a more compact representation of the target. In addition, we propose a combined loss function, which consists of two complementary tasks. One task is to learn a matching function directly and the other one is to learn a classification function. Moreover, our model does not need to load any pretrained weights and is trained with limited sequences from scratch. Plenty of experiments show that our tracker performs favorably against many state-of-the-art tracking methods.

Full Text