TransGANomaly: Transformer based Generative Adversarial Network for Video Anomaly Detection

Nazia Aslam,Maheshkumar H Kolekar

doi:10.1016/j.jvcir.2024.104108

Abstract

Video anomaly detection aims to identify a set of abnormal events in videos. Deep reconstruction and prediction-based models have been employed to detect anomalies. Deep reconstruction models sometimes recreate the abnormal events along with the normal ones. However, the prediction-based approaches have demonstrated encouraging results. This paper presents a video vision transformer (ViViT) based generative adversarial network (GAN), TransGANomaly, a novel approach for detecting anomalies. The proposed framework is a video frame predictor and trained only on normal video data adversarially. The generator of the GAN is a ViViT network that receives 3D input tokens from the video snippets. The generator aims to predict the future frame based on past sequences. After that, the predicted and original frames are given to the model’s discriminator for binary classification. Extensive experiments have been performed on UCSD Pedestrian, CUHK Avenue, and ShanghiaTech datasets to validate the efficacy of the proposed method.

Full Text