Despite significant advancements in remote sensing object tracking (RSOT) in recent years, achieving accurate and continuous tracking of tiny-sized targets remains a challenging task due to similar object interference and other related issues. In this paper, from the perspective of feature enhancement and a better feature matching strategy, we present a tracker SiamTM specifically designed for RSOT, which is mainly based on a new target information enhancement (TIE) module and a multi-level matching strategy. First, we propose a TIE module to address the challenge of tiny object sizes in satellite videos. The proposed TIE module goes along two spatial directions to capture orientation and position-aware information, respectively, while capturing inter-channel information at the global 2D image level. The TIE module enables the network to extract discriminative features of the targets more effectively from satellite images. Furthermore, we introduce a multi-level matching (MM) module that is better suited for satellite video targets. The MM module firstly embeds the target feature map after ROI Align into each position of the search region feature map to obtain a preliminary response map. Subsequently, the preliminary response map and the template region feature map are subjected to the Depth-wise Cross Correlation operation to get a more refined response map. Through this coarse-to-fine approach, the tracker obtains a response map with a more accurate position, which lays a good foundation for the prediction operation of the subsequent sub-networks. We conducted extensive experiments on two large satellite video single-object tracking datasets: SatSOT and SV248S. Without bells and whistles, the proposed tracker SiamTM achieved competitive results on both datasets while running at real-time speed.
Read full abstract