Abstract

AbstractIn recent years, the challenge of detecting anomalies in real‐world surveillance videos using weakly supervised data has emerged. Traditional methods, utilising multi‐instance learning (MIL) with video snippets, struggle with background noise and tend to overlook subtle anomalies. To tackle this, the authors propose a novel approach that crops snippets to create multiple instances with less noise, separately evaluates them and then fuses these evaluations for more precise anomaly detection. This method, however, leads to higher computational demands, especially during inference. Addressing this, our solution employs mutual learning to guide snippet feature training using these low‐noise crops. The authors integrate multiple instance learning (MIL) for the primary task with snippets as inputs and multiple‐multiple instance learning (MMIL) for an auxiliary task with crops during training. The authors’ approach ensures consistent multi‐instance results in both tasks and incorporates a temporal activation mutual learning module (TAML) for aligning temporal anomaly activations between snippets and crops, improving the overall quality of snippet representations. Additionally, a snippet feature discrimination enhancement module (SFDE) refines the snippet features further. Tested across various datasets, the authors’ method shows remarkable performance, notably achieving a frame‐level AUC of 85.78% on the UCF‐Crime dataset, while reducing computational costs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call