Abstract

Video anomaly detection (VAD) under weak supervision aims to temporally locate abnormal clips using the easy-to-obtain video-level labels. In this brief, we introduce the underlying thought of unsupervised VAD to the weakly supervised VAD and propose a collaborative normality learning framework to obtain more discriminative deep representations. Specifically, a deep auto-encoder is first trained in an unsupervised manner to learn the prototypical spatial-temporal patterns of normal videos. Then, both the normal and abnormal videos are used to train a regression module, where the objective is to make the average score of the abnormal videos higher than the maximum score of the normal videos. Finally, the clips in abnormal videos with an anomaly score lower than the average are regarded as normal and used to fine-tune the trained auto-encoder. The unsupervised auto-encoder collaborates with the weakly supervised regression model to extract prototypical features of normal clips, making the learned features of normal and abnormal events more distinguishable. Experimental results on three benchmark datasets show that the proposed framework achieves comparable performance to the state-of-the-art methods. Additionally, the results of ablation studies demonstrate the validity of collaborative normality learning.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call