Video anomaly detection has always been a challenging task in computer vision due to data imbalance and susceptibility to scene variations such as lighting and occlusions. In response to this challenge, this paper proposes an unsupervised video anomaly detection method based on an attention-enhanced memory network. The method utilizes a dual-stream network structure of autoencoders, enhancing the model’s learning ability for important features in appearance and motion by introducing coordinate attention mechanisms and variance attention mechanisms, emphasizing significant characteristics of static objects and rapidly moving regions. By adding memory modules to both the appearance and motion branches, the network structure’s memory information is reinforced, enabling it to capture long-term spatiotemporal dependencies in videos and thereby improving the accuracy of anomaly detection. Furthermore, by optimizing the network structure’s activation functions to handle negative inputs, it enhances its nonlinear modeling capabilities, enabling better adaptation to complex environments, including variations in lighting and occlusions, further improving the effectiveness of anomaly detection. The paper conducts comparative experiments and ablation studies using three public available datasets and various models. The results demonstrate that compared to baseline models, the AUC performance is improved by 3.9%, 4.7%, and 1.7% on UCSD Ped2, CHUK Avenue, and ShanghaiTech datasets, respectively. When compared with the other models, the average AUC performance is improved by 4.3%, 5.4%, and 6.2%, with an average improvement of 8.75% in the ERR metric, validating the effectiveness and adaptability of the proposed method. The code can be obtained at the following URL: https://github.com/AcademicWhite/AEMNet .
Read full abstract