Abstract

Video Anomaly Detection (VAD) aims to automatically identify unexpected spatial–temporal patterns to detect abnormal events in surveillance videos. Existing unsupervised VAD methods either use a single proxy task to represent the prototypical normal patterns, or split the video into static and dynamic parts and learn spatial and temporal normality separately, which cannot model the inherent uncertainty of motion. In this regard, we propose for the first time to learn video normality via stochastic motion representation. Specifically, the proposed Stochastic Video Normality (SVN) network learns the prototypical local appearance patterns via deterministic multi-task learning and global motion patterns in a non-deterministic manner. The stochastic representation module uses recurrent networks to model historical motion as a conditional Gaussian distribution and infers motion explicitly with the learned prior. Besides, a mask auto-encoder is introduced to explore the relationship between appearance and motion, boosting the model to learn spatial–temporal normality. Experimental results show that our proposed SVN network performs comparably to the state-of-the-art methods. Extensive analysis demonstrates the effectiveness of multi-task learning-based appearance representation and stochastic motion representation for unsupervised video anomaly detection.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.