Tracking the movement of objects in videos or footage from CCTV systems plays an integral role in crime investigations, surveillance, security predictions, and many other domains. Historically, this task was primarily entrusted to dedicated observers or analysts who would be summoned to review pre-recorded footage post-event. With the advent of machine learning and AI, convolutional neural networks (CNNs) have paved the way for computers to augment human capabilities in analyzing streaming videos or archived recordings. Among the various tracking methodologies that machine learning offers, YOLO (You Only Look Once) and R-CNN (Region-based Convolutional Neural Networks), along with its iterations, stand out as some of the most reliable and precise. However, the scope of analysis often extends beyond these techniques. To enhance accuracy and provide an adept classification mechanism, the Deep SORT (Simple Online and Realtime Tracking) algorithm emerges as pivotal. Its synergy with human detection remains a significant area of discussion and will be deliberated upon in this study. This review aims to elucidate the intricacies of these state-of-the-art methods and their interplay in modern tracking systems.
Read full abstract