Abstract
Multi-object tracking (MOT) in satellite videos is a challenging task due to the small size and blurry features of objects, which often lead to intermittent detection and tracking instability. Many existing object detection and tracking models often struggle with these issues, as they are not designed to effectively handle the unique characteristics of satellite videos. To address these challenges, we propose LocaLock, a joint detection and tracking framework for MOT that incorporates feature matching concepts from single object tracking (SOT) to enhance tracking stability and reduce intermittent tracking results. Specifically, LocaLock utilizes an anchor-free detection backbone for efficiency and employs a local cost volume (LCV) module to perform precise feature matching in the local area. This provides valuable object priors to the detection head, enabling the model to “lock” onto objects with greater accuracy and mitigate the instability associated with small object detection. Additionally, the local computation within the LCV module ensures low computational complexity and memory usage. Furthermore, LocaLock incorporates a novel motion flow (MoF) module to accumulate and exploit temporal information, further enhancing feature robustness and consistency across frames. Rigorous evaluations on the VISO dataset demonstrate the superior performance of LocaLock, surpassing existing methods in tracking accuracy and precision within the demanding satellite video analysis domain. Notably, LocaLock achieved state-of-the-art performance on the VISO benchmark, achieving a multi-object tracking accuracy (MOTA) of 62.6 while ensuring fast running speed.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have