HCgNet: A hierarchical context-guided network for multi-object tracking

Rui Li,Baopeng Zhang,Wei Liu,Zhaoyang Li,Jack Fan,Zhu Teng,Jianping Fan

doi:10.1016/j.knosys.2024.111859

Rui Li, Baopeng Zhang + Show 5 more

https://doi.org/10.1016/j.knosys.2024.111859

Copy DOI

Export

Save

Cite

Journal: Knowledge-Based Systems

Publication Date: May 11, 2024

Abstract
Full-Text
Similar Papers

Abstract

Listen

The anchor-free one-shot models, which localize the detections and extract embeddings by estimating center points in a single network have been proven highly effective in multi-object tracking (MOT). However, it is observed that the incomplete or unclear appearances of objects make the existing semantic feature aggregation in one-shot models less effective, which affects the performance of MOT. Moreover, these one-shot MOT models often generate wrong matches between detections and objects, because they ignore the influence of historical tracklet clues on objects. Motivated by these issues, we propose a novel hierarchical context-guided network for one-shot MOT, which performs the detection, embedding extraction, and object refinement by the hierarchical global-wise, patch-wise, and object-wise processing. Specifically, our method learns temporal and spatial context features in a global-wise and patch-wise manner to guide the multi-scale aggregation, so as to locate the area of interest and extract rich embeddings. In this way, the embedding of each detection owns both context relations besides semantic information, which reduces the loss of important information for tracked objects. At last, based on the learned context features, a context-guided object refinement module is designed to learn the tracklet embedding and produce refined objects in each frame, which can alleviate the erroneous matches between objects and detections. Extensive experiments conducted on several benchmarks, including 2D MOT2015, MOT17, and MOT20 datasets, demonstrate the effectiveness of our HCgNet.

Full Text