SORT-YM: An Algorithm of Multi-Object Tracking with YOLOv4-Tiny and Motion Prediction

Han Wu,Zhongping Ji,Chenjie Du,Mingyu Gao,Zhiwei He

doi:10.3390/electronics10182319

Abstract

Multi-object tracking (MOT) is a significant and widespread research field in image processing and computer vision. The goal of the MOT task consists in predicting the complete tracklets of multiple objects in a video sequence. There are usually many challenges that degrade the performance of the algorithm in the tracking process, such as occlusion and similar objects. However, the existing MOT algorithms based on the tracking-by-detection paradigm struggle to accurately predict the location of the objects that they fail to track in complex scenes, leading to tracking performance decay, such as an increase in the number of ID switches and tracking drifts. To tackle those difficulties, in this study, we design a motion prediction strategy for predicting the location of occluded objects. Since the occluded objects may be legible in earlier frames, we utilize the speed and location of the objects in the past frames to predict the possible location of the occluded objects. In addition, to improve the tracking speed and further enhance the tracking robustness, we utilize efficient YOLOv4-tiny to produce the detections in the proposed algorithm. By using YOLOv4-tiny, the tracking speed of our proposed method improved significantly. The experimental results on two widely used public datasets show that our proposed approach has obvious advantages in tracking accuracy and speed compared with other comparison algorithms. Compared to the Deep SORT baseline, our proposed method has a significant improvement in tracking performance.

Highlights

Multi-object tracking (MOT), which aims to assign and maintain a unique ID to each object of interest in a video sequence while predicting the location of all objects, is an essential branch of computer vision tasks
Focusing theproblem problem that that ititisisdifficult to to predict the location of occluded objects objects
Since the occluded objects may be legible for predicting the location of the lost objects

Summary

Introduction

Multi-object tracking (MOT), which aims to assign and maintain a unique ID to each object of interest in a video sequence while predicting the location of all objects, is an essential branch of computer vision tasks. As shown, there are many challenges in the actual tracking scenarios that will lead to tracking performance decay, including the interaction between objects, occlusions, the high similarity between different objects, interference of the background, etc. Under these challenges, undesirable errors such as bounding box drift and ID switches are prone to occur, resulting in tracking performance decay. This paper aims to propose a robust MOT algorithm in complex scenes

Objectives

Results

Conclusion