Unsupervised anomaly detection techniques, which do not rely on prior knowledge of anomalies, have attracted considerable attention in the field of industrial surface inspection. However, existing approaches commonly employ separate models for each product class, resulting in substantial storage requirements and inefficiency during the training phase Accordingly, we propose a masked subspace-like transformer for multi-class anomaly detection (MSTAD), which employs an encoder–decoder architecture to reconstruct the pre-trained image features by recognizing the greater resilience of high-level semantic features compared with low-level pixel features To address the issue of identity mapping, which refers to the tendency of a model to overgeneralize when reconstructing abnormal samples, MSTAD integrates two essential components: the multi-layer subspace-like embedding (MLSE) module and random block mask (RBM) method The MLSE module incorporates an attention mechanism to selectively emphasize the common embeddings associated with each class, thereby enhancing both the ability of the model to reconstruct anomalies as normal and its capacity for training RBM applies a random mask block mechanism to the pre-trained feature map to enhance the comprehension ability of the model and improve the reconstruction of normal features We conducted extensive experiments on the MVTec AD and BTAD datasets, and the results demonstrated that MSTAD outperformed previous state-of-the-art methods in terms of anomaly detection and localization performance for multi-class anomaly detection tasks.
Read full abstract