Abstract

Object tracking is one of the most important components in numerous applications of computer vision. Remote sensing videos provided by commercial satellites make it possible to extend this topic into the earth observation domain. In satellite videos, typical moving targets like vehicles and planes only cover a small area of pixels, and they could easily be confused with surrounding complex ground scenes. Similar objects nearby in satellite videos can hardly be differed by appearance details due to the resolution constraint. Thus, tracking drift caused by distractions is also a thorny problem. Facing challenges, traditional tracking methods such as correlation filters with hand-crafted visual features achieve unsatisfactory results in satellite videos. Methods based on deep neural networks have demonstrated their superiority in various ordinary visual tracking benchmarks, but their results on satellite videos remain unexplored. In this article, deep learning technologies are applied to object tracking in satellite videos for better performance. A simple regression network is used to combine a regression model with convolutional layers and a gradient descent algorithm. The regression network fully exploits the abundant background context to learn a robust tracker. Instead of handcrafted features, both appearance features and motion features, which are extracted by pretrained deep neural networks, are used for accurate object tracking. In cases when the tracker encounters ambiguous appearance information, the motion features could provide complementary and discriminative information to improve tracking performances. Experimental results on various satellite videos show that the proposed method achieves better tracking performance than other state-of-the-arts.

Highlights

  • V ISUAL object tracking is a fundamental task in computer vision with various practical applications

  • MDNet and ECO trackers reaches close A and AO value as the CRAM, showing their fine generalization ability on various kinds of videos. Their complexities are too high for real-time tracking, and the speed of the CRAM is more than ten times faster

  • The good performance of our method owes a great deal to its robustness, which is brought by the combination of appearance and motion features and the regression network framework

Read more

Summary

Introduction

V ISUAL object tracking is a fundamental task in computer vision with various practical applications. Given the initialized state (e.g., position and size) of a target object in a frame of a video, the goal of tracking is to estimate the states of the target in the following frames [1]. With the advancement in remote sensing satellite platforms, e.g., Skysat-1 and Jilin-1 satellite, high-resolution videos gazing a specific area of the ground are available. Manuscript received November 3, 2019; revised January 15, 2020; accepted January 31, 2020. Date of publication February 11, 2020; date of current version February 27, 2020.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call