Abstract

Video anomaly detection aims to identify anomalous segments in a video. It is typically trained with weakly supervised video-level labels. This paper focuses on two crucial factors affecting the performance of video anomaly detection models. First, we explore how to capture the local and global temporal dependencies more effectively. Previous architectures are effective at capturing either local and global information, but not both. We propose to employ a U-Net like structure to model both types of dependencies in a unified structure where the encoder learns global dependencies hierarchically on top of local ones; then the decoder propagates this global information back to the segment level for classification. Second, overfitting is a non-trivial issue for video anomaly detection due to limited training data. We propose weakly supervised contrastive regularization which adopts a feature-based approach to regularize the network. Contrastive regularization learns more generalizable features by enforcing inter-class separability and intra-class compactness. Extensive experiments on the UCF-Crime dataset shows that our approach outperforms several state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call