Abstract

The violence detection in surveillance videos is a complicated task, due to the requirements of extracting the spatio-temporal features in different videos environment, and various videos prospective cases. Hereby, in this paper, different architectures are proposed to perform this task in high performance, by using the UBI-Fights dataset as a comprehensive case study. The proposed architectures are based on involving the Convolutional Block Attention Modules (CBAM) with other simple layers (e.g., ConvLSTM2D or Conv2D&LSTM). In addition, using the Categorical Focal Loss (CFL) as loss function during architectures training to increase the focus on the most important features. To evaluate the proposed architectures, the performance metrics like are Area Under the Curve (AUC), and Equal Error Rate (EER); are mainly used, to declare the architecture ability of identifying the violence correctly, with low interaction value between classes. The performance results declare the ability of the proposed architectures, to achieve higher results that the state of art techniques. For example, the Conv2D&LSTM based architecture, get AUC value of 0.9493, and EER value of 0.0507; that outperform the most of the other proposed ones, and the state of art performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call