Detecting Violence in video streams is essential for public safety and security due to the rising frequency of violent incidents. Despite the extensive deployment of CCTV for surveillance, the available human monitoring resources still need to catch up with the need for vigilant supervision. This research presents a new lightweight model to address this gap by accurately identifying and categorizing violent behaviors in various scenarios, including CCTV footage. The proposed method leverages optical flow and RGB data to capture spatiotemporal features in the Violence data. Built on a Residual DLCNN architecture integrated with the Attention mechanism and GRU components, the model effectively handles high-dimensional video data, enhancing accuracy by prioritizing crucial frames containing violent and nonviolent instances. The proposed model's performance was validated on the Hockey Fights (HF), Movie Fights, and SCVD datasets, achieving impressive accuracies of 98.38%, 99.62%, and 90.57%, respectively. Here, we developed the Extended Automatic Violence Detection Dataset (EAVDD), featuring 1530 videos of violent scenes in movies, public spaces, social media, and sports. Testing the model with top fight scenes in rated movies yielded outstanding results. This research supports surveillance systems and advances short video analysis and understanding with applications in public safety, social media, sports, and law enforcement.
Read full abstract