Abstract

Detecting steel-surface defects is a crucial phase in steel manufacturing; however, accurately completing the detection task is challenging. The Swin Transformer, a self-attention-based model, has shown strong performance in the field of computer vision to enhance the adaptability of the Swin Transformer to the task of steel-surface defect detection, and a new network architecture called the LSwin Transformer is proposed in this study. First, in the downsampling process, we propose a convolutional embedding module and an attention patch merging module, which simultaneously strengthen the connections between the feature map channels, reduce the resolution, and increase image information retention. Second, we propose an effective window shift strategy and a convenient computation approach to make a complete defect between patches have more opportunity to obtain interactive computing. Finally, to combine the feature extraction capability of convolutional neural networks with the global dependency building capability of the Swin Transformer, we propose a depth multilayer perceptron module. Numerous experiments were conducted on a steel-surface defect dataset. The results demonstrated that the detection effect of our model outperformed competing methods, with a mean average precision of 81.2 %. In the ablation study, we verified the effectiveness of each module and initialized the parameters of the model through transfer learning to accelerate the convergence of the model. Therefore, the proposed LSwin Transformer has significant potential for detecting steel-surface defects.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call