Anomaly detection in video is an advanced computer vision challenge that recognizes video segments containing out-of-the-ordinary motions or objects. Most recent techniques in video anomaly detection have focused on reconstruction and prediction methods; however, in practice, frame reconstruction methods deliver suboptimal results due to the outstanding generalization abilities of convolutional neural networks when reconstructing abnormal frames. Meanwhile, frame prediction methods have drawn much attention and are a powerful way of simulating the dynamics of natural scenes. This paper provides a new unsupervised frame prediction-based algorithm for anomaly detection that improves overall performance. Our suggested strategy follows a U-Net-like architecture that employs a Time-distributed 2D CNN-based encoder and 2D CNN-based decoder. A memory module is used in the design to retrieve and store the most relevant prototypical pattern of the normal scenario in the memory slots during training giving our model the capacity to produce poor predictions in the case of unusual input. For the memory module to fully retain normal semantic patterns on multiple scales, we propose an upstream multi-branch structure composed of dilated convolutions to extract contextual information. We also provide a multi-path structure that, as a great substitute for the optical flow loss function, directly includes temporal information into the network design. Experiments on the UCSD Ped1, UCSD Ped2, and CUHK Avenue benchmark datasets revealed that our design outperforms most competing models.
Read full abstract