Efficient Violence Detection in Surveillance.

Romas Vijeikis,Vidas Raudonis,Gintaras Dervinis

doi:10.3390/s22062216

Romas Vijeikis, Vidas Raudonis + Show 1 more

Open Access

https://doi.org/10.3390/s22062216

Copy DOI

Journal: Sensors (Basel, Switzerland)	Publication Date: Mar 13, 2022
Citations: 33	License type: CC BY 4.0

Affiliation: Kaunas University of Technology

Abstract

Intelligent video surveillance systems are rapidly being introduced to public places. The adoption of computer vision and machine learning techniques enables various applications for collected video features; one of the major is safety monitoring. The efficacy of violent event detection is measured by the efficiency and accuracy of violent event detection. In this paper, we present a novel architecture for violence detection from video surveillance cameras. Our proposed model is a spatial feature extracting a U-Net-like network that uses MobileNet V2 as an encoder followed by LSTM for temporal feature extraction and classification. The proposed model is computationally light and still achieves good results—experiments showed that an average accuracy is 0.82 ± 2% and average precision is 0.81 ± 3% using a complex real-world security camera footage dataset based on RWF-2000.

Highlights

IntroductionPublisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations
More tests using real-life datasets can be completed in the future because the Hockey Fights and Movie Fights datasets are not really effective as trainable examples for violent scenes; the violence in the videos does not exactly occur naturally
In the future, combining features of several deep networks can lead us one step further in solving this difficult problem, such as crowd violence detection, development of algorithms that are suitable for UAVs, and creation of classification methods that can be applied for conditions of limited annotated data