Abstract

Unsupervised anomaly detection in videos is a challenging task owing to the remarkable generalization capacity of the deep convolutional autoencoders and the complex nature of anomalous events. In this study, we introduce a dissimilate-and-assimilate strategy to learn essential patterns of multilevel latent representations of normal spatial and temporal information. To obtain the core normality of the appearance and motion samples over multiple layers of the network, our proposed method diversifies the latent patterns of normal spatial and temporal data to make the out-of-distribution samples discrete (dissimilation) and integrates the latent features of two different samples into a single sample using a feature attention mechanism for robust optimization (assimilation). Based on the learned representations, the network generates convincing predictions of the normal frame, even if it receives abnormal samples after training. That is, the anomalous objects in a series of frames can be detected with significant reconstruction errors, thus leading to better detection and precise localization performance. To verify the effectiveness of the proposed method, we quantify the preciseness of anomaly localization using the outside-inside error ratio along with the traditional area under the curve (AUC) metric to measure the detection performance on the USCD Pedestrian 2, CHUK Avenue and ShanghaiTech Campus datasets.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call