ABSTRACT In recent years, the deep learning-based semantic segmentation for point clouds has demonstrated remarkable capabilities in processing 3D urban scenes for applications such as three-dimensional reconstruction, semantic modeling, and augmented reality. However, research on grottoes scenes is very limited. It is currently unclear how existing neural architectures for point cloud semantic segmentation perform in grotto scenes, and how to effectively incorporate the unique characteristics of grotto scenes to enhance the performance of deep neural networks. This study proposed a method for point cloud semantic segmentation of grotto scenes, combining knowledge with deep learning approaches. The method adopted knowledge to guide the creation of benchmark datasets, the design of a neural network called GSS-Net, and the correction of segmentation errors in the results of deep learning. The results show that the proposed method outperforms four existing mainstream models without the correction of segmentation results. Moreover, a set of ablation studies verified the effectiveness of each proposed module. This method not only improves the accuracy of point cloud semantic segmentation in grotto scenes but also enhances the interpretability of network designs. It provides new insights into the application of knowledge-guided deep learning models in grotto scenes.