Abstract

We study weakly-supervised anomaly detection where only video-level “anomalous”/“normal” labels are available in training, while anomaly events should be temporally localized in testing. For this task, a commonly used framework is multiple instance learning (MIL), where clip instances are sampled from individual videos to form video-level bags. This sampling process arguably is a bottleneck of MIL. If too many instances are sampled, we not only encounter high computational overheads but also have many noisy instances in the bag. On the other hand, when too few instances are used, e.g., through enlarged grids, much background noise may be included in the anomaly instances. To resolve this dilemma, we propose a simple yet effective method named Sub-Max. In partitioned image regions, it identifies instances that are most probable candidates for anomaly events by selecting cuboids that have high optical flow magnitudes. We show that our method effectively brings down the computational cost of the baseline MIL and at the same time significantly filters out the influence of noise. Albeit simple, this strategy is shown to facilitate the learning of discriminative features and thus improve event classification and localization performance. For example, after annotating the event location ground truths of the UCF-Crime test set, we report very competitive accuracy compared with the state of the art on both frame-level and pixel-level metrics, corresponding to classification and localization, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call