Abstract

Multiple Object Tracking (MOT) focuses on tracking all the objects in a video. Most MOT solutions follow a tracking-by-detection or a joint detection tracking paradigm to generate the object trajectories by exploiting the correlations between the detected objects in consecutive frames. However, according to our observations, considering only the correlations between the objects in the current frame and the objects in the previous frame will lead to an exponential information decay over time, thus resulting in a misidentification of the object, especially in scenes with dense crowds and occlusions. To address this problem, we propose an effectively finite-tailed updating (FTU) strategy to generate the appearance template of the object in the current frame by exploiting its local temporal context in videos. To be specific, we model the appearance template for the object in the current frame on the appearance templates of the objects in multiple earlier frames and dynamically combine them to obtain a more effective representation. Extensive experiments have been conducted, and the experimental results show that our tracker outperforms the state-of-the-art methods on MOT Challenge Benchmark. We have achieved 73.7% and 73.0% IDF1, and 46.1% and 45.0% MT on the MOT16 and MOT17 datasets, which are 0.9% and 0.7% IDFI higher, and 1.4% and 1.8% MT higher than FairMOT repsectively.

Highlights

  • Received: 15 November 2021Multiple Object Tracking (MOT) [1] is one of the hotspots in the field of computer vision

  • The appearance update mechanism of most current MOT methods is formulated as a simple linear combination of the appearance template of the previous frame and the Re-ID switches (IDs) feature of the current frame, which will cause a serious misidentification of the current object if the appearance template of the previous frame is incredible, owing to some severe occlusions

  • We propose an effective and flexible appearance update mechanism, named finitetailed updating (FTU), which combines the object’s historical accumulated appearance templates in multiple earlier frames with its Re-ID feature in the current frame to improve the identification performance of the object in the current frame

Read more

Summary

Introduction

Multiple Object Tracking (MOT) [1] is one of the hotspots in the field of computer vision. To obtain a promising result in MOT, we need to construct a robust model to generate a group of distinguishable appearance templates for the video frames. Of the person with a red arrow changes from 363 to 217 when the object we are tracking is occluded To address this issue, we propose an effective and flexible appearance update mechanism named finite-tailed updating (FTU). We use the Re-ID module and our proposed update mechanism to obtain the actual appearance template of each object in the current frame. We propose an effective and flexible appearance update mechanism, named finitetailed updating (FTU), which combines the object’s historical accumulated appearance templates in multiple earlier frames with its Re-ID feature in the current frame to improve the identification performance of the object in the current frame.

Multi-Object Tracking Framework
Update Object Appearance Template
Backbone Network
Object Detection Branch
Object-Embedding Branch
Updating the Motion Model
Updating the Appearance Model
The Proposed Updating Scheme
The Architecture of the Proposed MOT Framework
Experimental Setting
Comparison with State-of-the-Art Methods
Hyperparameter Comparison and Analysis Experiments
Comparison to Baseline Method FairMOT
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call