Single Object Tracking in Satellite Videos: Deep Siamese Network Incorporating an Interframe Difference Centroid Inertia Motion Model

Kun Zhu,Xiaodong Zhang,Zhiyong Lv,Xiujuan Cui,Yinan Zuo,Guanzhou Chen,Puyun Liao,Hongyu Wu,Xiaoliang Tan

doi:10.3390/rs13071298

Abstract

Satellite video single object tracking has attracted wide attention. The development of remote sensing platforms for earth observation technologies makes it increasingly convenient to acquire high-resolution satellite videos, which greatly accelerates ground target tracking. However, overlarge images with small object size, high similarity among multiple moving targets, and poor distinguishability between the objects and the background make this task most challenging. To solve these problems, a deep Siamese network (DSN) incorporating an interframe difference centroid inertia motion (ID-CIM) model is proposed in this paper. In object tracking tasks, the DSN inherently includes a template branch and a search branch; it extracts the features from these two branches and employs a Siamese region proposal network to obtain the position of the target in the search branch. The ID-CIM mechanism was proposed to alleviate model drift. These two modules build the ID-DSN framework and mutually reinforce the final tracking results. In addition, we also adopted existing object detection datasets for remotely sensed images to generate training datasets suitable for satellite video single object tracking. Ablation experiments were performed on six high-resolution satellite videos acquired from the International Space Station and “Jilin-1” satellites. We compared the proposed ID-DSN results with other 11 state-of-the-art trackers, including different networks and backbones. The comparison results show that our ID-DSN obtained a precision criterion of 0.927 and a success criterion of 0.694 with a frames per second (FPS) value of 32.117 implemented on a single NVIDIA GTX1070Ti GPU.

Highlights

We propose a method for generating training datasets suitable for satellite video single object tracking tasks
We propose a centroid inertia motion model based on interframe difference, because the target trajectory is consecutive and approximately linear in satellite video [9]
The precision plot reveals the percentage of frames in which the point distance (PD) between the target center location estimated by the tracker and the corresponding ground-truth center location is less than a range of thresholds

Summary

Introduction

Single object tracking is a fundamental research area in the field of computer vision and has attracted wide attention, as it is applied in many fields, such as automatic driving, human-computer interactions, video surveillance, and augmented reality [1,2,3,4]. After capturing the position and extent of an object in the first frame of a video, single object tracking obtains the position and extent of the target in subsequent frames [5]. Besides the tracking accuracy, speed is a criterion to measure performance [6]. Illumination, deformation and motion make single object tracking challenging [7,8]; many researchers have developed trackers to solve these problems

Results

Discussion

Conclusion