Single-object tracking algorithms based on Siamese full convolutional networks have attracted much attention from researchers owing to their improvement in precision and speed. Since this tracking model only learns a similarity model offline, it is not able to obtain more useful feature discrimination information to adapt to the various variations of targets in complex scenes. To improve the performance of this tracking model, we propose a Siamese network tracking algorithm that incorporates multiple attention mechanisms and an adaptive updating strategy for background features. First, a backbone feature extraction network is proposed that utilizes a small convolutional kernel to fuse jump-layer connectivity features, thereby improving the feature representation capability of the network. Second, an adaptive update strategy for background features is proposed to improve the model’s ability to discriminate between the object and background features. Third, the fusion of multiple attention mechanisms is proposed so that the model learns to focus on the channel, spatial, and coordinate features. Fourth, the response fusion operation is proposed after the inter-correlation operation to enrich the output response of the model. Finally, our algorithm is trained using the GOT-10K dataset and evaluated by testing on the object tracking benchmark datasets OTB100 and VOT2018. The test results show that compared with other algorithms, our algorithm can effectively cope with the problem of degradation of the tracking performance in complex environments, and it can further improve the tracking precision and precision under the premise of ensuring the tracking speed.
Read full abstract