Abstract

3D vehicle detectors based on point clouds generally have higher detection performance than detectors based on multi-sensors. However, with the lack of texture information, point-based methods get many missing detection of occluded and distant vehicles, and false detection with high-confidence of similarly shaped objects, which is a potential threat to traffic safety. Therefore, in the long run, fusion-based methods have more potential. This paper presents a multi-level fusion network for 3D vehicle detection from point clouds and images. The fusion network includes three stages: data-level fusion of point clouds and images, feature-level fusion of voxel and Bird’s Eye View (BEV) in the point cloud branch, and feature-level fusion of point clouds and images. Besides, a novel coarse-fine detection header is proposed, which simulates the two-stage detectors, generating coarse proposals on the encoder, and refining them on the decoder. Extensive experiments show that the proposed network has better detection performance on occluded and distant vehicles, and reduces the false detection of similarly shaped objects, proving its superiority over some state-of-the-art detectors on the challenging KITTI benchmark. Ablation studies have also demonstrated the effectiveness of each designed module.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call