Abstract

We consider weakly-supervised video anomaly detection in this work. This task aims to learn to localize video frames containing anomaly events with only binary video-level annotation, <i>i.e.</i>, anomaly vs. normal. Traditional approaches usually formulate it as a multiple instance learning problem, which ignore the intrinsic data imbalance issue that positive samples are very scarce compared to negative ones. In this paper, we focus on addressing this issue to boost detection performance further. We develop a new light-weight anomaly detection model that fully utilizes enough normal videos to train a classifier with a good discriminative ability for normal videos, and we employ it to improve the selectivity for anomalous segments and filter out normal segments. Specifically, in addition to boosting anomalous prediction, a novel contrastive attention module additionally produces a converted normal feature from anomalous video to refined anomalous predictions by maximizing the classifier making a mistake. Moreover, to remove the stubborn normal segments selected by the attention module, we also design an attention consistency loss to employ the classifier with high confidence for normal features to guide the attention module. Extensive experiments on two large-scale datasets, UCF-Crime, ShanghaiTech and XD-Violence, clearly demonstrate that our model largely improves frame-level AUC over the state-of-the-art. Code is released at <uri>https://github.com/changsn/Contrastive-Attention-for-Video-Anomaly-Detection</uri>.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call