Abstract

Weakly supervised video anomaly detection is a recent focus of computer vision research thanks to the availability of large-scale weakly supervised video datasets. However, most existing research works are limited to the frame-level classification with emphasis on finding the presence of specific objects or activities. In this article, a new neural network architecture is proposed to efficiently extract the prominent features for detecting whether a video contains anomalies. A video is treated as an integral input and the detection follows the procedure of video-label assignment. The extraction of spatial and temporal features is carried out by three-dimensional convolutions, and then their relationship is further modeled using an LSTM network. The concise structure of the proposed method enables high computational efficiency, and extensive experiments demonstrate its effectiveness.

Highlights

  • The extraction of spatial and temporal features is carried out by three-dimensional convolutions, and their relationship is further modeled using an long shortterm memory (LSTM) network

  • Max-pooling operations are used to replace the division of a video and to capture the most prominent spatial-temporal features corresponding to possible anomalies

  • The proposed framework followed a sound procedure to extract the critical features for anomaly detection

Read more

Summary

Introduction

The extraction of spatial and temporal features is carried out by three-dimensional convolutions, and their relationship is further modeled using an LSTM network. Anomalies in a video include common irregularities, like vandalism, assault, and traffic accidents, and some events under certain contexts such as a car entering a pedestrian-only zone. Though it seems that the identification of an abnormal object or event is the unique critical factor to consider, the context of a video is of equal importance for detection. The recent research trend suggests that detection and localization can be combined into a single end-to-end pipeline; the performance of such a combination remains to be explored since using a video anomaly detector to find the localization of abnormal frames may not be accurate as expected due to its nonlinear characteristics.

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.