Abstract

Camera-LiDAR 3D object detection has been extensively investigated due to its significance for many real-world applications. However, there are still of great challenges to address the intrinsic data difference and perform accurate feature fusion among two modalities. To these ends, we propose a two-stream architecture termed as CL3D, that integrates with point enhancement module, point-guided fusion module with IoU-aware head for cross-modal 3D object detection. Specifically, pseudo LiDAR is firstly generated from RGB image, and point enhancement module (PEM) is then designed to enhance the raw LiDAR with pseudo point. Moreover, point-guided fusion module (PFM) is developed to find image-point correspondence at different resolutions, and incorporate semantic with geometric features in a point-wise manner. We also investigate the inconsistency between localization confidence and classification score in 3D detection, and introduce IoU-aware prediction head (IoU Head) for accurate box regression. Comprehensive experiments are conducted on publicly available KITTI dataset, and CL3D reports the outstanding detection performance compared to both single- and multi-modal 3D detectors, demonstrating its effectiveness and competitiveness.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.