Efficient and accurate visual inspection of products is of great significance for intelligent manufacturing. In this paper, a visual inspection framework based on a lightweight Transformer is proposed for the pixel-level inspection of tire defects. A dual-path-Transformer feature encoder was proposed based on a hybrid architecture of convolutional neural network and Transformer, which is used to learn the local and global relationship of defect features. Moreover, multi-scale fusion Transformer (MFT) and spatial cross Transformer (SCT) were proposed, based on which a feature decoder was built. In the decoder, the MFT provides valuable spatial information for SCT, such that different levels of feature maps can refine the dependency of pixels through SCT. The proposed method is tested on the tire radiographic image dataset. Experimental results show that the proposed method reaches a detection accuracy of 98.57% and mIoU of 85.56%. Moreover, the method achieves the balance between accuracy and efficiency with satisfactory characterization ability for the defect geometric shape, and provides theoretical support for the future industrial deployment of Transformer.
Read full abstract