Abstract
Multi-sensor modal fusion has shown significant advantages in 3D object detection tasks. However, existing methods for fusing multi-modal features into the bird’s eye view (BEV) space often encounter challenges such as feature misalignment, underutilization of semantic information, and inaccurate depth estimation on the Z-axis, resulting in suboptimal performance. To address these issues, we propose Bi-Interfusion, a novel multi-modal fusion framework based on transformers. Bi-Interfusion incorporates a bidirectional fusion architecture, including components such as Pixel-wise Semantic Painting, Gaussian Depth Prior Distribution module, and Semantic Guidance Align module, to overcome the limitations of traditional fusion methods. Specifically, Bi-Interfusion employs a bidirectional cross-fusion strategy to merge image and LiDAR features, enabling the generation of multi-sensor BEV features. This approach leverages a refined Gaussian Depth Prior Distribution generated from LiDAR points, thereby improving the precision of view transformation. Additionally, we apply a pixel-wise semantic painting technique to embed image semantic information into LiDAR point clouds, facilitating a more comprehensive scene understanding. Furthermore, a transformer-based model is utilized to establish soft correspondences among multi-sensor BEV features, capturing positional dependencies and fully exploiting semantic information for alignment. Through experiments on nuScenes benchmark dataset, Bi-Interfusion demonstrates notable performance improvements, achieving a competitive performance of 72.6% mAP and 75.4% NDS in the 3D object detection task.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.