A simple transformer-based baseline for crowd tracking with Sequential Feature Aggregation and Hybrid Group Training

Cui Wang,Zewei Wu,Wei Ke,Zhang Xiong

doi:10.1016/j.jvcir.2024.104144

Abstract

Tracking pedestrians in crowded scenes is a challenging task. Existing transformer-based tracking methods integrate detection and tracking into a unified model, which simplifies the tracking process. However, these methods also introduce complicated attention mechanisms that increase the model complexity and cost. To address this issue, we propose SOTTrack, a simple online transformer-based method for crowd tracking. Our method enhances feature learning and training strategies without sacrificing simplicity and efficiency. Specifically, we introduce the Sequential Feature Aggregation (SFA) module and the Hybrid Group Training (HGT) approach. The SFA module fuses features from sequential images to improve the temporal consistency of visual features within short time intervals. The HGT approach assigns different queries to multiple guided tasks, such as label assignment and de-noising, which are only used during training and do not incur any inference cost. We evaluate our method on the MOT17 and MOT20 datasets and demonstrate its competitive performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A simple transformer-based baseline for crowd tracking with Sequential Feature Aggregation and Hybrid Group Training

Abstract

Talk to us

Similar Papers

More From: Journal of Visual Communication and Image Representation

Lead the way for us

Similar Papers

Tracking a variable number of pedestrians in crowded scenes by using laser range scanners
Xiaowei Shao ... Ryosuke Shibasaki
-
Xiaowei Shao, et. al. Xiaowei Shao ... Ryosuke Shibasaki
01 Oct 2008
01 Oct 2008

CFENet: Content-aware feature enhancement network for multi-person pose estimation
Xixia Xu ... Qi Zou
Applied Intelligence | VOL. 52
Xixia Xu, et. al.Xixia Xu ... Qi Zou
26 Apr 2021
Applied Intelligence | VOL. 52

Why do drivers fail to see pedestrians and other vulnerable road users?
T Sanocki ... J Doyon
Journal of Vision | VOL. 13
T Sanocki, et. al.T Sanocki ... J Doyon
25 Jul 2013
Journal of Vision | VOL. 13

Counting pedestrians in crowded scenes with efficient sparse learning
Masamichi Shimosaka ... Shinya Masuda
-
Masamichi Shimosaka, et. al.Masamichi Shimosaka ... Shinya Masuda
01 Nov 2011
01 Nov 2011

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A simple transformer-based baseline for crowd tracking with Sequential Feature Aggregation and Hybrid Group Training

Abstract

Talk to us

Similar Papers

More From: Journal of Visual Communication and Image Representation