Abstract

The continuous development of intelligent video surveillance systems has increased the demand for enhanced vision-based methods of automated detection of anomalies within various behaviors found in video scenes. Several methods have appeared in the literature that detect different anomalies by using the details of motion features associated with different actions. To enable the efficient detection of anomalies, alongside characterizing the specificities involved in features related to each behavior, the model complexity leading to computational expense must be reduced. This paper provides a lightweight framework (LightAnomalyNet) comprising a convolutional neural network (CNN) that is trained using input frames obtained by a computationally cost-effective method. The proposed framework effectively represents and differentiates between normal and abnormal events. In particular, this work defines human falls, some kinds of suspicious behavior, and violent acts as abnormal activities, and discriminates them from other (normal) activities in surveillance videos. Experiments on public datasets show that LightAnomalyNet yields better performance comparative to the existing methods in terms of classification accuracy and input frames generation.

Highlights

  • As a part of continuously strengthening video surveillance systems, the automated detection of abnormal behaviors is becoming more relevant [1,2]

  • One motivation behind using stacked grayscale 3-channel image (SG3I) within LightAnomalyNet framework was to eliminate the need for expensive motion representation methods, such as optical flow [20] and dynamic images [53]

  • As the dataset of SG3I images works with any pre-trained network, we evaluated the performance of the combination of the proposed lightweight convolutional neural network (CNN) and SG3Is compared to the other deep networks commonly used in the literature for abnormity detection

Read more

Summary

Introduction

As a part of continuously strengthening video surveillance systems, the automated detection of abnormal behaviors is becoming more relevant [1,2]. Kim et al [14] provided an interesting approach that, on one hand, encodes the temporal information efficiently, and on the other hand, eliminates the need for training highly complex 3D convolutional neural networks (3D CNNs) on large video datasets. They proposed the stacked grayscale 3-channel image (SG3I) format [14] that contains reasonably rich motion information with reduced computational expense, as compared with the same involved in other approaches like optical flow [20]. They use a two-stream 2D architecture pre-trained on image datasets to learn the motion features for behavior recognition

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call