Abstract

Video anomaly detection (VAD) involves identifying events or behaviours in video sequences that deviate from expected patterns. Most VAD models to date focus on seeking continuous improvement by directly learning identifiable visual cues from information-rich appearance data, regardless of the critical issue of privacy and data security in public places. This paper explores the possibility of addressing privacy-preserving VAD by privacy-independent data, such as human body skeleton and optical flow. However, due to the imbalanced nature of normality and anomaly, direct learning of the consistency of heterogeneous data may result in normality bias. To address the issues, we propose a novel motion exemplar-guided approach (a.k.a. Prime) that explicitly incorporates the support set of human skeleton poses into the VAD framework for breaking through the usefulness-versus-privacy dilemma. The support set containing diverse motion exemplars from the large-scale human skeleton-based action database enables our model to disentangle the coarsely defined anomalies. To learn the abnormal consistency between poses and optical flow, we introduce a Non-Minimum Suppression (NMS) strategy that adaptively highlights the correlation of anomalous pairs. The proposed architecture allows us to train our model with both fully and weakly-supervised paradigms in an end-to-end manner. We conducted performance evaluations of our method on three well-established datasets for VAD tasks: UCSD Ped2, Avenue, and ShanghaiTech. These evaluations were carried out in both privacy and non-privacy settings to assess the effectiveness of our approach. The results demonstrate that our approach surpasses the performance of most state-of-the-art (SOTA) methods, both in fully-supervised and weakly-supervised paradigms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call