Abstract

Industrial defect detection is gaining importance in the control of industrial product quality. Highly accurate and efficient defect detection with complex and variable industrial defect types is therefore an interesting but challenging problem. Vision transformers have been highly successful in a variety of computer vision tasks, due to their ability to capture global information in images. Nevertheless, simply capturing global information is problematic. On the one hand, because they are incapable of inductive bias as Convolutional Neural Network (CNN), transformers will have difficulty focusing on local features of defects in industrial defect image inspection tasks. On the other hand, using global computation leads to excessive memory and computational cost. To mitigate these issues, we propose a new vision transformer architecture which contains Hybrid Window Attention (HWA) and Dynamic Token Normalization (DTN). HWA, which combines pooling attention and window attention, makes the computational complexity reduced to improve efficiency. DTN enables transformers to focus on both the global information and the local features of defects, thus providing improved accuracy of industrial surface defect detection. Extensive experiments demonstrate that our Dynamic Vision Transformer (DHT) achieves 96.8% and 98.5% classification accuracy on the NEU dataset and the DAGM dataset, respectively, with a low computational complexity.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call