AbstractUnsupervised domain adaptation (UDA) is increasingly used for 3D point cloud semantic segmentation tasks due to its ability to address the issue of missing labels for new domains. However, most existing unsupervised domain adaptation methods focus only on uni‐modal data and are rarely applied to multi‐modal data. Therefore, we propose a cross‐modal UDA on multi‐modal datasets that contain 3D point clouds and 2D images for 3D Semantic Segmentation. Specifically, we first propose a Dual discriminator‐based Domain Adaptation (Dd‐bDA) module to enhance the adaptability of different domains. Second, given that the robustness of depth information to domain shifts can provide more details for semantic segmentation, we further employ a Dense depth Feature Fusion (DdFF) module to extract image features with rich depth cues. We evaluate our model in four unsupervised domain adaptation scenarios, i.e., dataset‐to‐dataset (A2D2 → SemanticKITTI), Day‐to‐Night, country‐to‐country (USA → Singapore), and synthetic‐to‐real (VirtualKITTI → SemanticKITTI). In all settings, the experimental results achieve significant improvements and surpass state‐of‐the‐art models.