Recently, template-based trackers have become the leading tracking algorithms with promising performance in terms of efficiency and accuracy. However, the correlation operation between query feature and the given template only achieves accurate target localization, but is prone to state estimation error, especially when the target suffers from severe deformation. To address this issue, segmentation-based trackers are proposed that use per-pixel matching to improve the tracking performance of deformable objects effectively. However, most of the existing trackers only match with the target features of the initial frame, thereby lacking the discrimination for handling a variety of challenging factors, e.g., similar distractors, background clutter, and appearance change. To this end, we propose a dynamic compact memory embedding technique to enhance the discrimination of the segmentation-based visual tracking method that can well tell the target from the background. Specifically, we initialize a memory embedding with the target features in the first frame. During the tracking process, the current target features that have certain correlation with the existing memory are updated to the memory embedding online. To further improve the tracking accuracy for deformable objects, we use a weighted point-to-global matching strategy to measure the correlation between the pixelwise query feature and the whole template, so as to capture more detailed deformation information. Extensive evaluations on six challenging tracking benchmarks including VOT2016, VOT2018, VOT2019, GOT-10K, TrackingNet, and LaSOT demonstrate the superiority of our method over recent remarkable trackers. Besides, our tracker outperforms the excellent segmentation-based trackers, i.e., D3S and SiamMask on the DAVIS2017 benchmark. The code is available at https://github.com/peace-love243/CMEDFL.
Read full abstract