Abstract

Although surface defect detection for roller bearings has received much attention in smart industrial manufacturing, most existing approaches employed the existing Convolution or Transformer models without modification to detect the defects. Such applications are undesirable because the classification models lack the necessary spatial relationship between multiple objects within an image. To develop a specialized model to detect bearing defects, we expand the applicability of the Swin Transformer. This paper presented a shunted-window Transformer (sSwin), which replaces the shifted-window mechanism with the shunted large-small windows. In sSwin, the self-attention heads can capture multi-scale token granularity features within the same self-attention layer. Also, a proposed local connection module enhances the boundary interaction of adjacent value tokens. As a result of the different-scale receptive fields of tokens within one attention layer, the model can bring significant performance gain on the defective detection tasks. The experiment demonstrates that sSwin-T outperforms the classical Swin-T by +2.8% APb and +1.9 APm, and Swinv2-T by +1.0% APb and +1.1% APm on COCO. It also surpasses them by +1.1% APb and +3.4% APb on the private bearing dataset with Cascade Mask-RCNN 3x.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call