Abstract
Combining LiDAR points and images for robust semantic segmentation has shown great potential. However, the heterogeneity between the two modalities (e.g. the density, the field of view) poses challenges in establishing a bijective mapping between each point and pixel. This modality alignment problem introduces new challenges in network design and data processing for cross-modal methods. Specifically, 1) points that are projected outside the image planes; 2) the complexity of maintaining geometric consistency limits the deployment of many data augmentation techniques. To address these challenges, we propose a cross-modal knowledge imputation and transition approach. First, we introduce a bidirectional feature fusion strategy that imputes missing image features and performs cross-modal fusion simultaneously. This allows us to generate reliable predictions even when images are missing. Second, we propose a Uni-to-Multi modal Knowledge Distillation (U2MKD) framework, leveraging the transfer of informative features from a single-modality teacher to a cross-modality student. This overcomes the issues of augmentation misalignment and enables us to train the student effectively. Extensive experiments on the nuScenes, Waymo, and SemanticKITTI datasets demonstrate the effectiveness of our approach. Notably, our method achieves an 8.3 mIoU gain over the LiDAR-only baseline on the nuScenes validation set and achieves state-of-the-art performance on the three datasets.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE transactions on pattern analysis and machine intelligence
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.