Abstract

Precisely perceiving the environment through a 3D perspective is essential and challenging for autonomous navigation and driving. Existing techniques rely on depth from LiDAR data or disparity in stereo vision to solve the poorly presented problem of detecting far-off and occluded objects. This increases structure complexity and computation burden, especially for single-stage systems. We argue that existing well-established detectors have the intrinsic potential to detect full-scene objects, but the extrinsic capabilities are limited by the structure form and optimization. Hence, we propose a double-branch single-stage monocular 3D object detection framework that aligns binary centers of object. Structurally, we construct two symmetrical and independent detectors, respectively using different prediction manners for 3D box parameters. Functionally, two detection heads have different sensitivities for the same object due to disentangling alignment. During the training, the detection heads were trained separately to obtain specific ability and aligned to promote the convergence. At inference, predictions of two branches are filtered via depth-aware non-maximal suppression (NMS) to acquire comprehensive detection results. Extensive experiments demonstrate that the proposed method achieves the state-of-the-art performance in monocular 3D detection on the KITTI-3D benchmark.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call