Abstract
Multiple Object Tracking (MOT) focuses on tracking all the objects in a video. Most MOT solutions follow a tracking-by-detection or a joint detection tracking paradigm to generate the object trajectories by exploiting the correlations between the detected objects in consecutive frames. However, according to our observations, considering only the correlations between the objects in the current frame and the objects in the previous frame will lead to an exponential information decay over time, thus resulting in a misidentification of the object, especially in scenes with dense crowds and occlusions. To address this problem, we propose an effectively finite-tailed updating (FTU) strategy to generate the appearance template of the object in the current frame by exploiting its local temporal context in videos. To be specific, we model the appearance template for the object in the current frame on the appearance templates of the objects in multiple earlier frames and dynamically combine them to obtain a more effective representation. Extensive experiments have been conducted, and the experimental results show that our tracker outperforms the state-of-the-art methods on MOT Challenge Benchmark. We have achieved 73.7% and 73.0% IDF1, and 46.1% and 45.0% MT on the MOT16 and MOT17 datasets, which are 0.9% and 0.7% IDFI higher, and 1.4% and 1.8% MT higher than FairMOT repsectively.
Highlights
Received: 15 November 2021Multiple Object Tracking (MOT) [1] is one of the hotspots in the field of computer vision
The appearance update mechanism of most current MOT methods is formulated as a simple linear combination of the appearance template of the previous frame and the Re-ID switches (IDs) feature of the current frame, which will cause a serious misidentification of the current object if the appearance template of the previous frame is incredible, owing to some severe occlusions
We propose an effective and flexible appearance update mechanism, named finitetailed updating (FTU), which combines the object’s historical accumulated appearance templates in multiple earlier frames with its Re-ID feature in the current frame to improve the identification performance of the object in the current frame
Summary
Multiple Object Tracking (MOT) [1] is one of the hotspots in the field of computer vision. To obtain a promising result in MOT, we need to construct a robust model to generate a group of distinguishable appearance templates for the video frames. Of the person with a red arrow changes from 363 to 217 when the object we are tracking is occluded To address this issue, we propose an effective and flexible appearance update mechanism named finite-tailed updating (FTU). We use the Re-ID module and our proposed update mechanism to obtain the actual appearance template of each object in the current frame. We propose an effective and flexible appearance update mechanism, named finitetailed updating (FTU), which combines the object’s historical accumulated appearance templates in multiple earlier frames with its Re-ID feature in the current frame to improve the identification performance of the object in the current frame.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have