Abstract

The coexistence of subtle and long-range anomalies in real-world industrial applications brings significant challenges for anomaly localization. Existing methods typically train deep models by utilizing the multi-level patches or layers-fusion approaches for learning the global-local distribution. However, these methods do not consider learning local and global features simultaneously, which suffer from inaccurate localization results. To this end, a hybrid transformer model (ViTALnet) is proposed here, which is established based on fine-grained feature reconstruction. Our ViTALnet first adopts the vision transformer to extract local discriminatory features as feature representation, which leverages the global semantic capturing capability. Then, an anomaly estimation module is proposed by integrating global attention and a pyramidal architecture to enhance contextual information for fine-grain anomaly localization. The experiments were extensively conducted on industrial anomaly localization datasets MVTec AD-Texture, NanoTWICE, and general textured datasets KolektorSDD2, MT Defect, and Dot-patterned Fabric, where our proposed ViTALnet outperformed major state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call