Abstract

Making full use of temporal and spatial information is critical to cope with the appearance changes of objects in visual object tracking. However, existing methods in the tracking field, which employ a memory network at frame level to learn this information, bring redundancy and cannot build long-term relationships among historical frames due to the limited memory size. In this paper, we propose a novel memory network, Pixel-level Spatio-Temporal Memory (PSTM), which organizes object features in an efficient way to leverage temporal and spatial context information. Specifically, PSTM is constructed and updated by a memory writer, which includes a pixel-level updating strategy to maintain the temporal consistency and dynamically memorize the noteworthy variations. Furthermore, in order to exploit relationships between the object and search region and precisely estimate the state of the object, we propose a memory reader, Pixel-wise Matching and Refinement module (PMR), and model spatial context without a complex manual-designed mechanism. Comprehensive experiments and comparisons on challenging large-scale benchmarks, including GOT-10k, TrackingNet, LaSOT, OTB2015, VOT2020, and NfS, have demonstrated the effectiveness of our proposed method, which performs favorably against state-of-the-art trackers.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call