Aiming at the problem of tracking failure due to target deformation, flipping and occlusion in visual tracking, a template updating algorithm based on image structural similarity is proposed by dynamically updating the template to adapt to the changes of the target during tracking, specifically, a queue is used to save the recent N -frame tracking results, and a decision is made on whether to update the template or not based on the structural similarity score between the current tracking results and the template image, and if updated, the template is matched from the historical N -frame tracking results as the new template. If updated, the optimal target image is matched from the historical N -frame tracking results as a new template for subsequent tracking. The tracking feature enhancement module and segmentation feature enhancement module are also designed based on SiamMask network. The tracking feature enhancement module consists of non-local operations and convolutional downsampling, which is used to establish contextual correlation, enhance the target features, suppress the background interference, improve the tracking robustness, and solve the feature attenuation problem due to the occlusion of the target. The segmentation feature enhancement module introduces the convolutional block attention module and deformable convolution to improve the network's ability to capture channel and spatial features, adaptively learn the shape and contour information of the target, and enhance the network's segmentation accuracy of the tracked target. , which in turn improves the tracking accuracy. Experiments show that the proposed algorithm performs well and stably in solving the above problems, improving the expected average overlap rate by 5.2%, 5.3%, and 2.5%, and the robustness by 6%, 7.9%, and 15.6%on the VOT2016, VOT2018, and VOT2019 datasets , respectively, and achieving a real-time speed of 91 frames per second, when compared to the baseline SiamMask.
Read full abstract