Abstract

Abnormal event detection aims to identify the events that deviate from expected normal patterns. Existing methods usually extract normal spatio-temporal patterns of appearance and motion in a separate manner, which ignores low-level correlations between appearance and motion patterns and may fall short of capturing fine-grained spatio-temporal patterns. In this paper, we propose to simultaneously learn appearance and motion to obtain fine-grained spatio-temporal patterns. To this end, we present an adversarial 3D convolutional auto-encoder to learn the normal spatio-temporal patterns and then identify abnormal events by diverging them from the learned normal patterns in videos. The encoder captures the low-level correlations between spatial and temporal dimensions of videos, and generates distinctive features representing visual spatio-temporal information. The decoder reconstrucccts the original video from the encoded features representing by 3D de-convolutions and learns the normal spatio-temporal patterns in an unsupervised manner. We introduce the denoising reconstruction error and adversarial learning strategy to train the 3D convolutional auto-encoder to implicitly learn accurate data distributions that are considered normal patterns, which benefits enhancing the reconstruction ability of the auto-encoder to discriminate abnormal events. Both the theoretical analysis and the extensive experiments on four publicly available datasets demonstrate the effectiveness of our method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call