Abstract

Satellite video is an emerging surface observation data that has drawn increasing interest due to its potential in spatiotemporal dynamic analysis. Single object tracking of satellite videos allows the continuous acquisition of the positions and ranges of objects and establishes the correspondences in the video sequence. However, small-sized objects are vulnerable to rotation and non-rigid deformation. Moreover, the horizontal bounding box of most trackers has difficulty in providing accurate semantic representations such as object position, orientation, and spatial distribution. In this article, we propose a unified framework, named rotation equivariant Siamese network enhanced by probability segmentation (REPS), to enhance the tracking accuracy and semantic representations simultaneously. First, to deal with the inconsistency of representations, we design a rotation equivariant (RE) Siamese network architecture to detect the rotation variations of objects right from the start frame, achieving the RE tracking. Second, a pixel-level (PL) refinement is proposed to refine the spatial distribution of objects. In addition, we proposed an adaptive Gaussian fusion that synergizes tracking and segmentation results to obtain compact outputs for satellite object representations. Extensive experiments on satellite videos demonstrate the superiority of the proposed approach. The code will be available at https://github.com/YZCU/REPS

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call