Video anomaly detection (VAD) has been extensively studied for static cameras but is much more challenging in egocentric driving videos where the scenes are extremely dynamic. This paper proposes an unsupervised method for traffic VAD based on future object localization. The idea is to predict future locations of traffic participants over a short horizon, and then monitor the accuracy and consistency of these predictions as evidence of an anomaly. Inconsistent predictions tend to indicate an anomaly has occurred or is about to occur. To evaluate our method, we introduce a new large-scale benchmark dataset called Detection of Traffic Anomaly (DoTA)containing 4,677 videos with temporal, spatial, and categorical annotations. We also propose a new VAD evaluation metric, called spatial-temporal area under curve (STAUC), and show that it captures how well a model detects both temporal and spatial locations of anomalies unlike existing metrics that focus only on temporal localization. Experimental results show our method outperforms state-of-the-art methods on DoTA in terms of both metrics. We offer rich categorical annotations in DoTA to benchmark video action detection and online action detection methods. The DoTA dataset has been made available at: https://github.com/MoonBlvd/Detection-of-Traffic-Anomaly.
Read full abstract