Abstract

Target tracking has always been a popular research area in computer vision, and many important methods have been proposed. However, most methods can only solve partial and slight occlusion. If the target is lost, a common solution is to keep detecting, reidentify the target when it reappears, and then link the broken tracks together, but this makes tracking discontinuous. There are two key points in this problem: continuous tracking and occlusion judgment. In this paper, we propose a target tracking method with a short-time prediction function to solve this problem. For continuous tracking, we establish a 3D dynamic model to estimate the motion state of the target in each frame. For occlusion judgment, we use a depth prediction network to estimate the depth of the target and then determine whether the target is occluded by the depth. Without relying on depth sensors or multiple cameras, we achieve depth estimation using only a single monocular image, which greatly expands the application of our method. Benefit from the introduction of motion estimation and depth prediction, the tracking accuracy of our method has been significantly improved, especially for better robustness to occlusion. Even when the target is completely occluded, it can be tracked for a short time without reidentification. In addition, we improve the speed of depth prediction through knowledge distillation by 2.08 times, and the final tracking speed reaches 52.6 Hz on GPU, which meets the real-time tracking requirements.

Highlights

  • Modern society produces a large number of videos every day

  • Our goal is to build a real-time tracking system, so the algorithms that we select from MOT17 and MOT20 for comparison all reach a speed of 25 frames per second

  • We can see that our method achieves the best performance on Multiple object tracking accuracy (MOTA), IDF1, MT, ML, false negatives (FN), and identity switches (IDS)

Read more

Summary

Introduction

Modern society produces a large number of videos every day. As an important means of video analysis, video object tracking has a wide range of applications, such as autonomous driving [1], robotics [2], and augmented reality [3]. Great progress has been made over these years, most methods are based on the assumption that the target is visible. These methods can only solve partial and slight occlusion problems. There are some ways to solve this problem, and the mostly used one is reidentification, but this breaks the continuity of the tracking. We cannot pay the consequences of ignoring the completely missing target, such as online intelligent driving. To solve this problem, we can start from two aspects: continuous tracking and occlusion judgment

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call