Abstract Surface defect detection is pivotal for ensuring product quality in various industries. These defects typically manifest as low background contrast, substantial variations in shape, and a scarcity of balanced training samples. Traditional defect detection methods often encounter limitations in terms of low detection accuracy and insufficient detection granularity. This study introduces an innovative attentive semantic feature fusion transformer network (ASFormer) tailored for pixel-level defect detection tasks to overcome these challenges. Initially, the transformer architecture is employed to extract the multiscale features of defects. Subsequently, a multiscale semantic fusion module is integrated, complemented by a dynamic upsampling mechanism, to mitigate the loss of detailed information during feature fusion. Moreover, a scale-aware dual-attention module is developed, which effectively captures the intricacies of the fused features across both channel and spatial dimensions, ensuring pixel-level detection precision. Additionally, a contextual boundary loss function is proposed to augment the ability of the network to discern defect boundaries, different categories and defect scales. Experimental validation on the NEU-Seg and Crack500 surface defect datasets demonstrated that the proposed ASFormer achieved state-of-the-art performance, with mIoU scores of 85.24% and 78.20%, respectively.
Read full abstract