Abstract In order to rapidly identify and locate the weld seam initial point in robotic automated welding, we established a binocular vision system and proposed a weld seam initial point localization algorithm named WIPL-Net. Built upon the Fully Convolutional One-Stage object detection network, WIPL-Net introduces a lightweight ResNext as its backbone network and incorporates channel attention and enhanced feature fusion mechanisms to enhance feature detection and extraction capabilities. Subsequently, WIPL-Net is utilized to obtain the weld seam’s initial point, and its three-dimensional coordinates are determined through trigonometric measurements. To further estimate the robot’s posture at the initial point, we performed sparse three-dimensional reconstruction of the local region centered on the weld seam initial point based on You Only Look At Coefficients of Tensors instance segmentation and feature point matching. Finally, we conducted comparative experiments on WIPL-Net and conducted weld seam initial point localization experiments in real welding scenarios. The results demonstrate that our proposed method achieves a positioning error of less than 1.2 mm for the weld seam’s initial point and a pose error of less than 10 degrees for the robot, meeting the requirement for real-time positioning of the weld seam’s initial point.