Abstract
Autonomous driving perception relies on standard sensors such as color cameras and LiDAR, but each sensor has limitations when sensing complex and diverse environments. As a result, sensor fusion techniques have attracted attention due to their stable perception performance. Previous studies have focused on fusing camera images and LiDAR point clouds for 3D object detection. However, while the camera provides rich texture information at high resolution, the features of objects at sparse point clouds are underutilized, suggesting a gap in the literature. This work proposes a flexible fusion network for 3D object detection. It includes a frustum-aware decorator (FAD) that densifies and applies a textured surface to point clouds. A voxel-wise encoder is then applied, and point cloud features are extracted and aligned from a bird’s-eye view before being fused with camera images. The fused features are passed through a region proposal network and detection head sequentially for 3D object detection. Our proposed network achieves leading mean average precision (mAP) of 71.49 and 72.09 in a multi-model comparison of the KITTI and nuScenes 3D object detection benchmarks, respectively. In addition, the novel FAD can also be combined with other state-of-the-art methods flexiblely. A series of comparison experiments demonstrate that integration with the FAD could at least widely increase the +2.0 mAP for LiDAR-only and Fusion-based 3D object detectors. The source code and tool are available at: https://github.com/denyz/FADN.git.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have