Abstract
Multi-object tracking (MOT) in satellite videos is an essential topic with many applications, such as traffic monitoring and disaster response. However, many multi-object trackers that perform well in natural scenes show weak generalization in satellite videos due to low object discrimination caused by low spatial resolution and the widespread indistinguishable background, such as clouds and reflections. In this paper, we design a novel multi-object tracking framework called CFTracker (Cross-Frame Tracker) for satellite videos from the point of both network structure and training method. On the one hand, in network structure design, a cross-frame feature update module (CFU) is proposed to enhance object recognition and reduce the response to background noises by using rich temporal semantic information. On the other hand, we reveal that the picture-pair training approach used by the mainstream MOT network is not entirely conducive to the network learning temporal semantic information. To better grasp the cross-frame feature connections and output time-consistent motion predictions, we train CFTracker by a novel cross-frame training flow (CT). Experiments demonstrate the effectiveness of our CFTracker and obtain state-of-the-art tracking accuracy and precision of 72.9% score on the AIR-MOT dataset and 57.1% score on the VISO dataset. The code will be available online.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have