This study introduces a novel approach that addresses the limitations of existing methods by integrating 2D image processing with 3D point cloud analysis, enhanced by interpretable neural networks. Unlike traditional methods that rely on either 2D or 3D data alone, our approach leverages the complementary strengths of both data types to improve detection accuracy in environments adversely affected by welding spatter and smoke. Our system employs an improved Faster R-CNN model with a ResNet50 backbone for 2D image analysis, coupled with an innovative orthogonal plane intersection line extraction algorithm for 3D point cloud processing. By incorporating explainable components such as visualizable feature maps and a transparent region proposal network, we address the “black box” issue common in deep learning models.This architecture enables a more transparent decision-making process, providing technicians with necessary insights to understand and trust the system’s outputs. The Faster-RCNN structure is designed to break down the object detection process into distinct, understandable steps, from initial feature extraction to final bounding box refinement. This fusion of 2D-3D data analysis and interpretability not only improves detection performance but also sets a new standard for transparency and reliability in automated welding systems, facilitating wider adoption in industrial applications.