Multimodal 3D object detection has gained significant attention due to the fusion of light detection and range (LiDAR) and RGB data. Existing 3D detection models in autonomous driving are typically trained on dense point cloud data from high-specification LiDAR sensors. However, budgetary constraints often lead to adopting low point-per-second (PPS) LiDAR sensors in real-world scenarios. The low PPS specification can generate a sparse point cloud. In this case, the existing models trained on dense data with a high PPS specification cannot achieve optimal performance under a sparse point cloud. To address this problem, we propose DenseSphere for robust multimodal 3D object detection under a sparse point cloud. Considering the data acquisition process of LiDAR sensors, DenseSphere involves the spherical coordinate-based point upsampler. Specifically, points are interpolated in the horizontal or vertical direction using bilateral interpolation. The interpolated points are refined using dilated pyramid blocks with various receptive fields. For efficient fusion with generated dense point cloud, we use a graph-based detector and hierarchical layers. Then, we demonstrate the performance of DenseSphere by comparing it with other multimodal 3D object detection models through experiments. The visual results and source code with the pretrained models are available at https://github.com/Jung-jongwon/DenseSphere.
Read full abstract