The adoption of security systems based on computer vision for violence detection has the potential to significantly improve safety in various public and private properties. However, developing these systems can be extremely challenging.We can choose to use classification models to identify violence in images or also use object detection models to identify firearms, which may indicate robbery. Additionally, when developing such systems focused on private environments, we encounter specific challenges, such as obtaining appropriate datasets to train the algorithms. Many publicly available datasets for violence detection consist of outdoor images, with elements such as streets and cars, which do not adequately reflect the nuances and unique characteristics of private properties. In this work, we evaluate both learned and handcrafted features to classify videos as 'violence' or 'non-violence' across a variety of datasets, including a new dataset composed exclusively of closed-circuit television (CCTV) images. Additionally, we propose a new dataset for firearm detection in CCTV images and conduct some experiments using YoloV8. In this way, we hope to provide a clearer insight into the possible decisions when developing a security system for indoor environments.
Read full abstract