Abstract

3D Multi-Object Tracking (MOT) in dynamic point cloud sequences is a fundamental research problem for several downstream tasks such as motion planning and action recognition. Existing methods usually rely on the traditional tracking-by-detection (TBD) paradigm, which performs the tracking based on the results achieved by dedicated detectors. However, this two-stage framework usually cannot sufficiently exploit spatial-temporal information and end-to-end optimization, leading to sub-optimal tracking performance, especially when the object is partially or completely occluded. In this paper, we propose a joint detection and tracking framework named <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">CenterTube</i> for dynamic point cloud sequences. The key to our approach is to formulate the problem of multiple object trajectory predictions as 4D tubelet detections. In particular, the proposed CenterTube is composed of three head branches, including a center branch, a regression branch, and a movement branch for the estimation of object center, object size, instance movement, and frame interval, respectively. Additionally, a Tube BEV-IoU (TB-IoU) is also presented to link the generated clip-level tubelets and form the final tracks. Extensive experiments conducted on the KITTI-MOT and nuScenes datasets demonstrate that our model achieves competitive performances even if no ready-made detection results is adopted.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call