Machine-vision-based defect detection for large metal stamping is a fundamental requirement for improving product quality and inspection speed. However, the performance of machine vision in detecting defects is limited by the large variety of stamped products, significant dimensional differences, and long data-acquisition cycles. To overcome these problems, metal stamping detection has been achieved using the improved YOLOV5 model, which features a slim neck and a multiheaded self-attentive mechanism for detecting wrinkles, holes, and cracks in metal stamping. An image-processing module was used to reduce the impact of metal reflections and improve image quality. Stable diffusion improvement was used to augment the dataset to overcome the small dataset size problem and enhance its generalisation capability. In addition, a BotNet structure with a self-attentive mechanism was introduced into the YOLOV5 model backbone to improve the image feature extraction capability. We then optimised the prediction head structure of YOLOV5 to improve the detection speed and accuracy. Ablation experiments were performed to analyse and verify the effectiveness of each module. The results of the ablation experiments show that the mAP of our proposed stable diffusion improvement data enhancement method, and the YOLO-Bot-VOV algorithm, for metal stamped part defect detection reached 98.2 %, and the parameters were reduced by 0.432 million.