Abstract

Abnormal behavior detection in surveillance videos is necessary for public monitoring and safety. In human-based surveillance systems, it requires continuous human attention and observation, which is a difficult task. The autonomous detection of such events is of essential significance. However, due to the scarcity of labeled data and the low occurrence probability of these events, abnormal event detection is a challenging vision problem. In this paper, we introduce a novel two-stage architecture for detecting anomalous behavior in videos. In the first stage, we propose a 3D Convolutional Autoencoder (3D-CAE) architecture to extract spatio-temporal features from normal event training videos. In 3D-CAE, the encoder and decoder architectures are based on 3D convolutions, which can learn both appearance and the motion features effectively in an unsupervised manner. In the second stage, we group the 3D spatio-temporal features into different normality clusters, and then remove the sparse clusters to represent a stronger pattern of normality. From these clusters, one-class SVM classifier is used to distinguish between normal and abnormal events based on the normality scores. Experimental results on four different benchmark datasets show significant performance improvement compared to state-of-the-art approaches while providing results in real-time.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.