To improve the ability of video anomaly detection models to extract normal behavior features of samples and suppress abnormal behaviors, this paper proposes an unsupervised video anomaly detection model, which takes advantage of spatio-temporal feature fusion, storage module, attention mechanism, and 3D autoencoder model. The model utilizes autoencoder to capture scene feature maps to enhance anomaly feature extraction. These maps are merged with the original video frames, forming fundamental units constituting continuous sequences serving as the model's input. Moreover, the attention mechanism is integrated into the 3D convolutional neural network to strengthen the network's capability in extracting channel and spatial features from videos. Experimental validation is performed on a publicly accessible campus dataset, illustrating the model's superior accuracy in anomaly detection.