Abstract
Despite the significant growth in the availability of 3D light detection and ranging (LiDAR) point cloud data in recent years, annotation remains expensive and time-consuming. This has led to an increasing demand for weakly-supervised semantic segmentation (WSSS) methods in applications such as autonomous driving, mapping, and robotics. Existing approaches typically rely solely on LiDAR point cloud data for WSSS, which often results in lower segmentation accuracy due to the sparsity of point clouds. To address these challenges, we propose a novel architecture, PPDistiller, which employs multiple teacher networks from different modalities. Compared to other WSSS and multimodal approaches, PPDistiller achieves superior segmentation accuracy with fewer annotations. This is facilitated through the novel Mean Multi-Teacher Framework (MMT), which incorporates multiple modalities and teachers. To address the issue of lacking 2D labels, we propose the Distance-CAM Self-Training (DCAM-ST) module, which utilizes sparse 3D weak annotations to produce accurate 2D pixel-level annotations. To enable adaptive fusion of 2D and 3D data, we introduce the Attention Point to Pixel Fusion (APPF) module, facilitating bidirectional transfer of cross-modal knowledge. Additionally, to fully leverage the spatial semantic information in point cloud, we propose the Pyramid Semantic-context Neighbor Aggregation (PSNA) module, aiming to exploit spatial and semantic correlations to improve performance. Extensive experimentation on SemanticKITTI, ScribbleKITTI and nuScenes datasets demonstrates the superiority of our proposed method. Compared to state-of-the-art fusion and weakly-supervised methods, PPDistiller achieves the highest mean Intersection over Union (mIoU) scores under both fully-supervised and weakly-supervised settings.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.