Abstract

Video anomaly detection is to automatically identify predefined anomalous contents (e.g. abnormal objects, behaviors and scenes) in videos. The performance of video anomaly detection can be effectively improved by making the model focus more on the anomalous objects in videos. However, such existing approaches usually rely on pre-trained models, which not only require additional auxiliary information but also face the challenge of anomaly diversity in the real world. In this paper, we propose a new video anomaly detection method based on spatio-temporal relationships among objects. Concretely, we use a fully convolutional encoder-decoder network with symmetric skip connections as the backbone network, which can effectively extract features from the object regions at different scales. In the encoding stage, an attention mechanism is used to enhance the model’s understanding of the spatio-temporal relationships among various types of objects in the video. In the decoding stage, a dynamic pattern generator is designed to memorize the inter-object spatio-temporal relationships, which thus enhances the reconstructions of normal samples while making the reconstructions of abnormal samples more difficult. We conduct extensive experiments on three widely used video anomaly detection datasets CUHK Avenue, ShanghaiTech Campus and UCSD Ped2, and the experimental results show that our proposed method can significantly improve the performance, and achieves state-of-the-art overall performance (considering both effectiveness and efficiency). In particular, our method achieves a state-of-the-art AUC of 98.4% on the UCSD Ped2 dataset that consists of various anomalies in real-world scenarios.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call