Abstract

Achieving robust 3D object detection by fusing images and point clouds remains challenging. In this paper, we propose a novel 3D object detector (SimpleFusion) that enables simple and efficient multi-sensor fusion. Our main motivation is to boost feature extraction from a single modality and fuse them into a unified space. Specifically, we build a new visual 3D object detector in the camera stream that leverages point cloud supervision for more accurate depth prediction; in the lidar stream, we introduce a robust 3D object detector that utilizes multi-view and multi-scale features to overcome the sparsity of point clouds. Finally, we propose a dynamic fusion module to focus on more confident features and achieve accurate 3D object detection based on dynamic weights. Our method has been evaluated on the nuScenes dataset, and the experimental results indicate that it outperforms other state-of-the-art methods by a significant margin.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call