Abstract

Violence, in any form, is a disgrace to our civilized world. Nevertheless, even in modern days, violence is an integral part of our society and causes the deaths of many innocent lives. One of the conventional means of violence is using a firearm. Firearm-related death is currently a global phenomenon. It is a threat to society and a challenge to law enforcement agencies. A significant portion of such crimes happens in semi-urban areas or cities. Nowadays, CCTV-based surveillance is widely used by governments and private organizations for monitoring and prevention. However, human-based monitoring requires a significant amount of person-hours as a resource and is prone to mistakes. On the other hand, automated smart surveillance for violent activities is more suitable for scale and reliability. The paper’s main focus is to showcase that deep learning-based techniques can be used in combination to detect firearms (particularly guns). This paper uses different detection techniques, such as Faster Region-Based Convolutional Neural Networks (Faster RCNN) and the latest EfficientDet-based architectures for detecting guns and human faces. An ensemble (stacked) scheme has improved the detection performance to identify human faces and guns at the post-processing level using Non-Maximum Suppression, Non-Maximum Weighted, and Weighted Boxes Fusion techniques. This paper has empirically discussed the comparative results of various detection techniques and their ensembles. It helps the police to gather quick intelligence about the incident and take preventive measures at the earliest. Also, the same method can be used to identify social-media videos for gun-based content detection. Here, theWeighted Boxes Fusion-based ensemble detection scheme provides mean average precision 77.02%, 16.40%, 29.73% for the mAP <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">0.5</sub> , mAP <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">0.75</sub> and mAP <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">[0.500.95]</sub> , respectively. The results achieve the best performance among all the experimented alternatives. The model has been rigorously tested with unknown test images and movie clips. The obtained ensemble schemes are satisfactory and consistently improve over primary models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call