Anomaly detection in video surveillance is critical for enhancing security and public safety across various applications, including traffic monitoring, public spaces, and industrial settings. Traditional methods often struggle with the complexity and variability of real-world data, prompting a shift towards advanced machine learning models. This paper presents a comprehensive analysis of deep learning algorithms, including YOLOv5, 3D CNNs, LSTM, Deep SVDD, Vision Transformers, Temporal Transformers, and Autoencoders, applied to three benchmark datasets: CIFAR-10, MVTec AD, and UCSD Anomaly Detection. We compare these algorithms based on accuracy, precision, recall, and F1-score, providing insight into their strengths and weaknesses. The results suggest that Vision Transformers and CNN-LSTM hybrids offer superior performance across spatial and temporal anomaly detection tasks.
Read full abstract