Visual object tracking (VOT) for intelligent video surveillance has attracted great attention in the current research community, thanks to advances in computer vision and camera technology. Meanwhile, discriminative correlation filter (DCF) trackers garnered significant interest owing to their high accuracy and low computing cost. Many researchers have introduced spatial and temporal regularization into the DCF framework to achieve a more robust appearance model and further improve tracking performance. However, these algorithms typically set fixed spatial and temporal regularization parameters, which limit flexibility and adaptability under cluttered and challenging scenarios. To overcome these problems, in this work, we propose a new dynamic spatial–temporal regularization for the DCF tracking model that emphasizes the filter to concentrate on more reliable regions during the training stage. Furthermore, we present a response deviation-suppressed regularization term for responses to encourage temporal consistency and avoid model degradation by suppressing relative response changes between two consecutive frames. Moreover, we introduce a multi-memory tracking framework to exploit various features and each memory contributes to tracking the target across all frames. Significant experiments on the OTB-2013, OTB-2015, TC-128, UAV-123, UAVDT, and DTB-70 datasets have revealed that the performance thereof outperformed many state-of-the-art trackers based on DCF and deep-based frameworks in terms of tracking accuracy and tracking success rate.
Read full abstract