Bi-Directional Tracklet Embedding for Multi-Object Tracking

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

The last decade has seen significant advancements in multi-object tracking, particularly with the emergence of deep learning based methods. However, many prior studies in online tracking have primarily focused on enhancing track management or extracting visual features, often leading to hybrid approaches with limited effectiveness, especially in scenarios with severe occlusions. Conversely, in offline tracking, there has been a lack of emphasis on robust motion cues. In response, this approach aims to present a novel solution for offline tracking by merging tracklets using some recent promising learning-based architectures. We leverage a jointly performing Transformer and Graph Neural Network (GNN) encoder to integrate both the individual motions of targets and their interactions in between. By enabling bi-directional information propagation between the Transformer and the GNN, proposed model allows motion modeling to depend on interactions, and conversely, interaction modeling to depend on the motion of each target. The proposed solution is an end-to-end trainable model that eliminates the requirement for any handcrafted short-term or long-term matching processes. This approach performs on par with state-of-the-art multi-object tracking algorithms, demonstrating its effectiveness and robustness.

Similar Papers
  • Book Chapter
  • Cite Count Icon 14
  • 10.1007/978-3-319-98776-7_38
A Survey of Multi-object Video Tracking Algorithms
  • Nov 5, 2018
  • Shuren Zhou + 3 more

Video multi-object tracking is one of the important research topics in the field of computer vision, which is widely used in military and civil areas. At present, the research of single object tracking algorithm is quite mature, however the research of multi-object tracking is still ongoing. This paper focuses on four important stages in the multi-object tracking process: feature extraction, detector, data association and the tracker. The feature extraction part introduces the current methods of feature extraction, as well as the merits and demerits of each method; In the stage of detection, the tracking effect of the object appearance model in specific applications is described, and then the paper analyze the multi-object tracking algorithm based on detection and tracking as well as the multi-object tracking algorithm based on deep learning; In the tracking stage, the establishment of object motion model and multi-object tracking with different tracker hybrid algorithm are introduced; During the stage of data association, the paper introduce the multi-object tracking based on energy minimization and commonly used data association algorithm, respectively. Then the current mainstream datasets and evaluation methods are introduced. Finally, the future development of the multi-object tracking is discussed and forecasted.

  • Research Article
  • Cite Count Icon 3
  • 10.1088/1742-6596/1871/1/012152
Learning for Graph Matching based Multi-object Tracking in Auto Driving
  • Apr 1, 2021
  • Journal of Physics: Conference Series
  • Yihao Yin + 2 more

Multi-object tracking in autonomous driving aims to represent trajectories of moving objects for planning system of the vehicle. In this paper, we propose a new tracking-bydetection scheme based on deep neural networks for multi-object detection and tracking in autonomous driving scene. We first introduce a light-weight neural network branch for fast object detection, and based on the detection results on each frame, we build two object graphs for consecutive frames separately, where the vertices of the graph represent the objects in the image, and the edges of the graph represent the spatial relations between the objects. We then formulate the multi-object tracking problem as the graph matching process by learning the relevance between objects from another object association network branch. Experiments results on the MOT multi-object tracking dataset show that the proposed object detection and tracking approach achieves comparable results with state-of-the-art deep learning based multi-object tracking methods, and outperforms them in tracking efficiency, which ensures real-time multiobject tracking for autonomous driving.

  • Research Article
  • Cite Count Icon 677
  • 10.1016/j.cviu.2020.102907
UA-DETRAC: A new benchmark and protocol for multi-object detection and tracking
  • Jan 27, 2020
  • Computer Vision and Image Understanding
  • Longyin Wen + 8 more

UA-DETRAC: A new benchmark and protocol for multi-object detection and tracking

  • Research Article
  • Cite Count Icon 2
  • 10.1115/1.4050863
CenterTrack3D: Improved CenterTrack More Suitable for Three-Dimensional Objects
  • Apr 1, 2021
  • Journal of Autonomous Vehicles and Systems
  • Lipeng Gu + 3 more

Compared with two-dimensional (2D) multi-object tracking (MOT) algorithms, three-dimensional (3D) multi-object tracking algorithms have more research significance and broad application prospects in the unmanned vehicles research field. Aiming at the problem of 3D multi-object detection and tracking, in this paper, the multi-object tracker CenterTrack, which focuses on 2D multi-object tracking task while ignoring object 3D information, is improved mainly from two aspects of detection and tracking, and the improved network is called CenterTrack3D. In terms of detection, CenterTrack3D uses the idea of attention mechanism to optimize the way that the previous-frame image and the heatmap of previous-frame tracklets are added to the current-frame image as input, and second convolutional layer of the hm output head is replaced by dynamic convolution layer, which further improves the ability to detect occluded objects. In terms of tracking, a cascaded data association algorithm based on 3D Kalman filter is proposed to make full use of the 3D information of objects in the image and increase the robustness of the 3D multi-object tracker. The experimental results show that, compared with the original CenterTrack and the existing 3D multi-object tracking methods, CenterTrack3D achieves 88.75% MOTA for cars and 59.40% MOTA for pedestrians and is very competitive on the KITTI tracking benchmark test set.

  • Research Article
  • 10.17762/turcomat.v12i6.5801
Online Multi-Object Tracking in Videos Based on Features Detected by YOLO
  • Apr 5, 2021
  • Turkish Journal of Computer and Mathematics Education (TURCOMAT)
  • Younis A Al-Arbo, Prof.Dr Khalil I Alsaif

With the rapid development of different applications that rely on multi-object detection and tracking, significant attention has been brought toward improving the performance of these methods. Recently, Artificial Neural Networks (ANNs) have shown outstanding performance in different applications, where objects detection and tracking are no exception. In this paper, we proposed a new object tracking method based on descriptors extracted using the convolutional filters of the YOLOv3 neural network. As these features are detected and processed during the detection phase, the proposed method has exploited these features to produce efficient and robust descriptors. The proposed method has shown better performance, compared to state-of-the-art methods, by producing better predictions using less computations. The evaluation results show that the proposed method has been able to process an average of 207.6 frames per second to track objects with 67.6% Multi-Object Tracking Accuracy (MOTA) and 89.1% Multi-Object Tracking Precision (MOTP).

  • Research Article
  • Cite Count Icon 4
  • 10.69996/jcai.2024012
Multi-Object Detection and Tracking with Modified Optimization Classification in Video Sequences
  • Jun 30, 2024
  • Journal of Computer Allied Intelligence
  • Prabu S + 2 more

The paper presents a novel approach to enhancing multi-object detection and tracking in video sequences using a Modified Ant Swarm Optimization Deep Learning (ASO-DL) algorithm. The ASO-DL algorithm synergistically combines the optimization capabilities of ant swarm optimization with the powerful feature extraction abilities of deep learning models, resulting in a robust framework for realtime video analytics. Extensive simulations and experiments demonstrate significant improvements in key performance metrics, including accuracy, precision, recall, and F1 score, across various iterations. The proposed method consistently outperforms baseline models, achieving a final best fitness value of 0.96, with an accuracy of 0.98, precision of 0.99, and recall of 0.95. Additionally, classification results across different datasets such as CIFAR-10, IMDB, COCO, and ImageNet highlight the algorithm’s versatility and effectiveness. This research contributes to the field by providing a highly optimized solution for complex multi-object tracking tasks, offering substantial advancements in the accuracy and efficiency of real-time object detection systems. The findings hold significant potential for applications in surveillance, autonomous vehicles, and other domains requiring precise and reliable multi-object tracking.

  • Research Article
  • 10.1002/aisy.202500100
Gaussian Mixture Model‐Based Data Association Incorporating a Deep Learning Network for Multivehicle Tracking and Detection in Autonomous Driving Systems
  • Sep 17, 2025
  • Advanced Intelligent Systems
  • Muhammad Adeel Altaf + 1 more

In autonomous driving systems, 2D and 3D object detection and tracking demand accurate detection, robust affinity computation, and efficient data association in real‐time environments. This article presents a deep learning‐based multivehicle tracking and detection framework that fuses light detection and ranging (LiDAR) and camera data for simultaneous detection and tracking. The proposed system integrates a Gaussian mixture model‐based data association and performs object detection and correlation using 2D images and 3D point cloud inputs. A key contribution of this work is a robust affinity computation module that effectively handles multiple occlusions and models object appearance and motion in 3D space. Additionally, the framework introduces a joint data association strategy that optimizes affinity scores, detection confidence, and start‐end probabilities. Extensive experiments on the Karlsruhe Institute of Technology and Toyota Technological Institute car tracking benchmark demonstrate that the proposed method achieves real‐time performance and superior tracking accuracy, outperforming multiple state‐of‐the‐art LiDAR‐camera fusion methods, including the joint multiobject detection and tracking baseline by up to 1.69% in multiobject tracking precision and 0.10% in multiobject tracking accuracy, while also achieving more stable trajectories and fewer identity switches than boost correlation multiobject detection and tracking.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/dicta.2017.8227387
A Framework to Combine Multi-Object Video Segmentation and Tracking
  • Nov 1, 2017
  • Sehr Nadeem + 2 more

Multi-object video segmentation and multi-object tracking are very similar in the aspect that both determine the locations and maintain the identities of the objects of interest (targets) in each frame of the video. Our approach takes advantage of this fact and uses the strengths of one task to improve the accuracy of the other. In our framework, the multi-object tracking and segmentation modules initially produce results on our dataset independently. The tracking module enforces higher-order smoothness constraints on the object trajectories and uses Lagrangian relaxation to get an iterative solution method. The segmentation module forms superpixels through clustering, trains a linear SVM using Lab color to obtain the foreground and background segmentation and assigns ID labels based on color and optical flow. The results of these two modules are then jointly processed and updated. The locations of the tracking bounding boxes are refined with the help of the segmentation results, so that they are more precisely centered on the targets. The tracking module is more accurate in terms of ID assignment and hence, its results are used to correct errors in ID labeling in the segmentation module. Both modules identify and add any target detections they initially missed to their results using the results of the other component. Hence, this joint processing increases the accuracy of both the tracking and the segmentation results as can be seen from our experimental results. Our approach is comparable to state-of-the-art tracking and segmentation techniques.

  • Book Chapter
  • Cite Count Icon 2
  • 10.1007/978-3-319-54526-4_35
Actions Recognition in Crowd Based on Coarse-to-Fine Multi-object Tracking
  • Jan 1, 2017
  • Sixue Gong + 3 more

Action recognition has wide applications from video surveillance, scene understanding to forensic investigation. While recent methods typically focus on a single action recognition from video clips, we investigate the problem of action recognition in crowd, which better replicates real video surveillance scenarios. We propose to perform actions recognition in crowd based on an efficient coarse-to-fine multi-object tracking algorithm. With Faster R-CNN as our human detector, we utilize a coarse-to-fine strategy for multi-object tracking in crowd, consisting of multi-object fast tracking and per-object fine tracking. The tracking results are used to extract the action cuboids, and spatial-temporal features are computed for action classification. We evaluate the proposed approach on a self-collected actions-in-crowd dataset, and two public domain databases (CMU and and MOT2015). The results show the effectiveness of the proposed approach for multi-action recognition in crowd.

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/cso.2009.378
Multi-object Tracking with Explicit Reasoning about Occlusion
  • Apr 1, 2009
  • Jingling Wang + 4 more

Multi-object tracking in monocular video sequence is a challenging work when objects are occluded and objects' number is unknown or varies during tracking. In this paper, a multi-object parallel tracking method is proposed based on Bayesian framework. First, our method is designed to avoid huge amount of computation as required in multi-object joint tracking method. Second, our method can explicitly reason about occlusions, the depth ordering of interactive objects is inferred. We calculate the observation transition matrix to determine the movement transition between successive frames, given the object observations obtained in each frame. Each tracker could also collaborate with one another to decide which object is occluding and which is occluded when occlusion occurs. Our experiment results demonstrate that our method of using multiple trackers could automatically initialize and track multiple objects with varying numbers and occlusion.

  • Research Article
  • Cite Count Icon 9
  • 10.3390/app14072690
Vehicle Multi-Object Detection and Tracking Algorithm Based on Improved You Only Look Once 5s Version and DeepSORT
  • Mar 22, 2024
  • Applied Sciences
  • Thioanh Bui + 3 more

The increasing popularity of vehicles has led to traffic congestion and frequent traffic accidents. Intelligent transportation technology is an effective solution to this problem. In order to improve the accuracy and effectiveness of vehicle detection and tracking, this paper combined the improved YOLOv5s model with the optimized DeepSORT tracking algorithm to detect and track vehicles on traffic roads. Firstly, in the detection model of YOLOv5s, the Attention-based Intra-scale Feature Interaction (AIFI) module is introduced to detect vehicles more quickly and accurately. Secondly, the Kalman filtering (KF) algorithm of DeepSORT is optimized to improve the accuracy of predictions of the vehicle state by using the width to replace the length-to-width ratio of the vehicle prediction box in the original KF algorithm. Finally, in the re-recognition network of DeepSORT, the original Convolutional Neural Network (CNN) model is replaced by an improved ResNet36 as the backbone network for feature extraction. The experimental results show that, compared with the original algorithm, when examining the performance of the improved algorithm in terms of target detection, the recall rate, average accuracy (mAP), and detection speed, are increased by 7.7%, 15.5%, and 14.2%, respectively; in terms of multi-object tracking performance, such as multi-object tracking precision (MOTP) and multi-object tracking accuracy (MOTA), improvements of 14.84% and 9.62%, respectively, are obtained and the total number of times a trajectory is fragmented (Frag) is reduced by 32.52%.These results indicate that the proposed algorithm can meet the requirements of accuracy, real-time detection, and stable vehicle detection and tracking on traffic roads.

  • Conference Article
  • Cite Count Icon 3
  • 10.1117/12.2262439
A data set for evaluating the performance of multi-class multi-object video tracking
  • May 1, 2017
  • Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE
  • Avishek Chakraborty + 4 more

One of the challenges in evaluating multi-object video detection, tracking and classification systems is having publically available data sets with which to compare different systems. However, the measures of performance for tracking and classification are different. Data sets that are suitable for evaluating tracking systems may not be appropriate for classification. Tracking video data sets typically only have ground truth track IDs, while classification video data sets only have ground truth class-label IDs. The former identifies the same object over multiple frames, while the latter identifies the type of object in individual frames. This paper describes an advancement of the ground truth meta-data for the DARPA Neovision2 Tower data set to allow both the evaluation of tracking and classification. The ground truth data sets presented in this paper contain unique object IDs across 5 different classes of object (Car, Bus, Truck, Person, Cyclist) for 24 videos of 871 image frames each. In addition to the object IDs and class labels, the ground truth data also contains the original bounding box coordinates together with new bounding boxes in instances where un-annotated objects were present. The unique IDs are maintained during occlusions between multiple objects or when objects re-enter the field of view. This will provide: a solid foundation for evaluating the performance of multi-object tracking of different types of objects, a straightforward comparison of tracking system performance using the standard Multi Object Tracking (MOT) framework, and classification performance using the Neovision2 metrics. These data have been hosted publically.

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/sws.2010.5607349
Real-time integrated multi-object detection and tracking in video sequences using detection and mean shift based particle filters
  • Aug 1, 2010
  • Yuanyuan Jia + 1 more

Object detection and tracking have been studied separately in most cases. This paper presents a new method integrating generic object detection with particle filtering based tracking algorithm in one consistent framework to achieve real time robust multi-object tracking (MOT) in video sequences. By using detection, we can not only do initialization automatically and dynamically, but also solve the data association problem for MOT easily. To improve the degeneracy problem which most particle filtering methods suffer with, we incorporate the strength of resampling, proposed detection based optimal importance function, and mean shift mode seeking together to make particles much more efficient and estimate the posterior density better. The detection result gives the global optimal of the posterior density while the mean shift mode seeking finds the local optimal. Experimental results show the superior performance of our approach to the available tracking methods.

  • Research Article
  • 10.3390/ani15243650
Pig Health Assessment Framework Based on Behavioural Analysis
  • Dec 18, 2025
  • Animals : an Open Access Journal from MDPI
  • Shuqin Tu + 6 more

The long-term behavioural analysis and health assessment of Pigs are essential for intelligent management in modern pig farming. Manual tracking and behaviour analysis for constructing health assessment systems are often subjective, inconsistent, and lack sufficient accuracy. To overcome these challenges, this study proposes a health assessment framework for pigs based on multi-object behaviour tracking and analysis under large-scale pig farming. The proposed framework consists of three modules: an improved ByteTrack-based multi-object tracking (MOT) module, a behaviour statistics and analysis module, and a health assessment module. The pipeline involves using the MOT module to obtain pigs' behavioural data, followed by the behaviour analysis module and health assessment module to analyse and evaluate the health status of the pigs. Two datasets comprising 18 videos of healthy pigs and 10 videos of unhealthy pigs were created to validate the framework. Experimental results demonstrated that the improved ByteTrack algorithm achieved high performance in MOT metrics, including a High-Order Tracking Accuracy (HOTA) of 74.0%, Multiple Object Tracking Accuracy (MOTA) of 92.2%, Identification F1 Score (IDF1) of 89.4%, and 43 identity switches (IDs). The behaviour statistics derived from these tracking results enabled reliable inputs for the health assessment model, which accurately assesses the health status of each pig. The results demonstrate that the proposed framework provides an effective solution and reliable technical support for pig health monitoring in modern pig farming.

  • Conference Article
  • 10.1109/icscan53069.2021.9526403
Multi-Object Recognition and Tracking with Automated Image Annotation for Big Data Based Video Surveillance
  • Jul 30, 2021
  • K Vijiyakumar + 2 more

Presently, the scope and application of Big Data Analytics in video surveillance makes it possible in different domains. In the area of intelligent visual surveillance, the procedure of tracking is described as finding a path or trajectory of an object of a given video sequence. Multi-Object tracking (MOT) mechanism become more familiar because of its applicability in numerous ways. Generally, MOT is employed to predict the position of various specified objects across multiple consequent frames with the offered ground truth position of the target in the beginning frame. In this paper, we have introduced an improved region based scalable convolution neural network (IRS-CNN) based MOT model. The presented IRS-CNN model enhances the existing RS-CNN by incorporating an automated image annotation (AIA) tool for increasing the detection rate as well as reducing the computation time. The interesting feature of AIA tool helps to rapidly annotate the training images in an automatic way. The novel IRS-CNN approach is tested against a benchmark UCSD anomaly detection dataset. A broad experimental result verified the optimal behavior of IRS-CNN model against a set of applied test images over the compared methods.

Save Icon
Up Arrow
Open/Close