Visual anomaly detection has emerged as a highly applicable solution in practical industrial manufacturing, owing to its notable effectiveness and efficiency. However, it also presents several challenges and uncertainties. To address the complexity of anomaly types and the high cost associated with data annotation, this paper introduces a self-supervised learning framework called TDAD, based on a two-stage diffusion model. TDAD consists of three key components: anomaly synthesis, image reconstruction, and defect segmentation. It is trained end-to-end, with the goal of improving pixel-level segmentation accuracy of anomalies and reducing false detection rates. By synthesizing anomalies from normal samples, designing a diffusion model-based reconstruction network, and incorporating a multiscale semantic feature fusion module for defect segmentation, TDAD achieves state-of-the-art performance in image-level detection and anomaly localization on challenging and widely used datasets such as MVTec and VisA benchmarks.
Read full abstract