ABSTRACT Three-dimensional (3D) semantic segmentation based on point clouds has become a critical technology in the field of intelligent spatial perception and scene understanding. However, most existing methods cannot fully exploit the spatial feature of 3D point cloud data and the orderliness of two-dimensional (2D) image information, resulting in an inability to effectively improve segmentation accuracy. This study proposes a novel 2D image and 3D point cloud fusion method named shared multi-layer perceptron fusion semantic segmentation (SMFusionSeg) for semantic segmentation. The rich features of 2D image segmentation and spatial information of 3D point clouds are integrated by designing a fusion architecture that facilitates the transfer of 2D semantics to 3D semantics. Specifically, the fusion architecture concatenates the features from both 2D and 3D domains, which are then further integrated through a shared multi-layer perceptron (MLP) and optimized using an attention mechanism to enhance feature effectiveness and distinctiveness. Moreover, a knowledge distillation strategy is introduced to improve the learning capability and segmentation accuracy in parsing complex scenes. Experimental results on the SemanticKITTI dataset demonstrate that the proposed SMFusionSeg achieves a mean Intersection over Union ( mIoU ) of 65.4%, significantly outperforming traditional single-modal methods and many existing segmentation techniques, thereby effectively enhancing the precision and robustness of geographic feature recognition.
Read full abstract