Abstract
Multi-object tracking (MOT) in satellite videos is an essential topic with many applications, such as traffic monitoring and disaster response. However, many multi-object trackers that perform well in natural scenes show weak generalization in satellite videos due to low object discrimination caused by low spatial resolution and the widespread indistinguishable background, such as clouds and reflections. In this paper, we design a novel multi-object tracking framework called CFTracker (Cross-Frame Tracker) for satellite videos from the point of both network structure and training method. On the one hand, in network structure design, a cross-frame feature update module (CFU) is proposed to enhance object recognition and reduce the response to background noises by using rich temporal semantic information. On the other hand, we reveal that the picture-pair training approach used by the mainstream MOT network is not entirely conducive to the network learning temporal semantic information. To better grasp the cross-frame feature connections and output time-consistent motion predictions, we train CFTracker by a novel cross-frame training flow (CT). Experiments demonstrate the effectiveness of our CFTracker and obtain state-of-the-art tracking accuracy and precision of 72.9% score on the AIR-MOT dataset and 57.1% score on the VISO dataset. The code will be available online.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Geoscience and Remote Sensing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.