In the field of computer vision, anomaly detection is a binary classification task used to identify exceptional instances within image datasets. Typically, it can be divided into two aspects: texture defect detection and semantic anomaly detection. Existing methods often use pre-trained feature extractors to singly capture semantic or spatial features of images, and then employ different classifiers to handle these two types of anomaly detection tasks. However, these methods fail to fully utilize the synergistic relationship between these two types of features, resulting in algorithms that excel in one type of anomaly detection task but perform poorly in the other type. Therefore, we propose a novel approach that successfully combines these two types of features into a normalizing flow learning module to address both types of anomaly detection tasks. Specifically, we first adopt a pre-trained Vision Transformer (ViT) model to capture both texture and semantic features of input images. Subsequently, using the semantic features as input, we design a novel normalizing flow model to fit the semantic distribution of normal data. In addition, we introduce a feature fusion module based on attention mechanisms to integrate the most relevant texture and semantic information between these two types of features, significantly enhancing the model’s ability to simultaneously represent the spatial texture and semantic features of the input image. Finally, We conduct comprehensive experiments on well-known semantic and texture anomaly detection datasets, namely Cifar10 and MVTec, to evaluate the performance of our proposed method. The results demonstrate that our model achieves outstanding performance in both semantic and texture anomaly detection tasks, particularly achieving state-of-the-art results in semantic anomaly detection.