Aiming at the problems of target detection network in the defect detection field of thangka images with complex background colors, such as poor small target detection effect, insufficient feature information extraction, prone to error detection and leak detection, and low accuracy of defect detection, this paper proposed the YOLOv5 defect detection algorithm combining attention mechanism and receptive field. First of all, the Backbone network is used for feature extraction, integrating attention mechanism to represent different features, so that the network can fully extract the texture and semantic features of the defect area, and the extracted features are weighted and fused to reduce information loss. Secondly, a weighted fusion of features of different dimensions is transferred by the Neck network, and the combination of FPN and PAN is used to realize the fusion of semantic features and texture features of different layers and to locate the defect target more accurately. Finally, while replacing the GIoU loss function with CIoU, the receptive field is added to the network, so that the algorithm uses a four-channel detection mechanism to expand the detection range of receptive fields, and fuses semantic information between different network layers, so as to achieve fast location and more refined processing of small targets. The experimental results show that compared with the original YOLOv5 network, the detection accuracy of YOLOV5-scSE and YOLOV5-CA networks proposed in this paper has improved by 9.96 percentage points and 12.22 percentage points respectively, and the verification index has been significantly improved. It can quickly and more accurately identify and locate the location of the defect area and has a stronger ability to generalize the defect category, which greatly improves the accuracy of thangka image defect detection.