Despite the significant growth in the availability of 3D light detection and ranging (LiDAR) point cloud data in recent years, annotation remains expensive and time-consuming. This has led to an increasing demand for weakly-supervised semantic segmentation (WSSS) methods in applications such as autonomous driving, mapping, and robotics. Existing approaches typically rely solely on LiDAR point cloud data for WSSS, which often results in lower segmentation accuracy due to the sparsity of point clouds. To address these challenges, we propose a novel architecture, PPDistiller, which employs multiple teacher networks from different modalities. Compared to other WSSS and multimodal approaches, PPDistiller achieves superior segmentation accuracy with fewer annotations. This is facilitated through the novel Mean Multi-Teacher Framework (MMT), which incorporates multiple modalities and teachers. To address the issue of lacking 2D labels, we propose the Distance-CAM Self-Training (DCAM-ST) module, which utilizes sparse 3D weak annotations to produce accurate 2D pixel-level annotations. To enable adaptive fusion of 2D and 3D data, we introduce the Attention Point to Pixel Fusion (APPF) module, facilitating bidirectional transfer of cross-modal knowledge. Additionally, to fully leverage the spatial semantic information in point cloud, we propose the Pyramid Semantic-context Neighbor Aggregation (PSNA) module, aiming to exploit spatial and semantic correlations to improve performance. Extensive experimentation on SemanticKITTI, ScribbleKITTI and nuScenes datasets demonstrates the superiority of our proposed method. Compared to state-of-the-art fusion and weakly-supervised methods, PPDistiller achieves the highest mean Intersection over Union (mIoU) scores under both fully-supervised and weakly-supervised settings.