Abstract
Recently a large number of 3D object tracking methods have been extensively investigated and applied in a variety of applications using convolutional neural networks. Although most of them have made great progress in partial occlusion, the intricate interweaving of moving agents (e.g. pedestrians and vehicles) may lead to inferior performance of 3D object tracking in complex traffic scenes. To boost the performance of 3D object tracking in cases of severe occlusions, we present an end-to-end deep learning framework with a driving behavior-aware model that takes full advantage of spatial-temporal details in consecutive frames and learns the driving behavior from object variations in 2D center point, depth, rotation and translation in parallel. In contrast to prior work, our novelty formulates driving behavior that reasons about the possible motion trajectories of the investigated target for autonomous systems. We show in experiments that our method outperforms state-of-the-art approaches on 3D object tracking in the challenging nuScenes dataset.
Highlights
Multi-object tracking (MOT), called multi-target tracking (MTT), is an essential component technology in many computer vision applications such as autonomous driving [1]–[3] and robot collision prediction [4], [5]
Compared with the state-ofthe-art CenterTrack framework that is based solely on object 2D displacement supervised feature representations, our driving behavior-aware hierarchical architecture encodes object motion components and object variations in consecutive frames, producing a sufficiently better high-level knowledgebased 2D displacement offset for 3D object tracking in complex traffic scenes
By exploring the object variations in motion components that consist of 2D center offset, depth offset, rotation and translation offset in consecutive frames, our framework in contrast to prior work [3] that aims to formulate driving behavior for efficient 3D object tracking with a finer 2D displacement
Summary
Multi-object tracking (MOT), called multi-target tracking (MTT), is an essential component technology in many computer vision applications such as autonomous driving [1]–[3] and robot collision prediction [4], [5]. Inspired by the prior works [28]–[30], we consider a natural formulation that the movements of road agents with different poses and scales are determined by human driving behavior Based on this natural formulation, instead of encoding object center offsets on 2D plane for 3D tracking [3], we take full advantage of spatial-temporal details across consecutive frames and propose an end-to-end deep learning framework to learn the driving behavior from variations in 2D center point, depth, rotation and translation in the magnitude and direction of hidden-state vectors.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have