Abstract

There is a scarcity of intelligent security systems in previous researches focusing on violence detection for public spaces. Previous efforts focus only on system accuracy but not their speed or operability. This defeats the purpose of the safety-critical aspect of security systems. With this, we developed a violence recognition framework for surveillance systems using a two-stream neural network architecture for complex action decomposition. As there is no publicly available dataset for static surveillance cameras for violence recognition considering them as complex actions, the proponents organized a multi-view dataset for determining simple violent actions. The main pipeline of the system utilizes state-of-the-art architectures on object detection for extracting spatial features and optical flow estimation for temporal features that are then fused using a boost fusion algorithm to determine simple actions. The simple actions are then further classified as either violent or non-violent actions by logistic regression which completes the complex action recognition. The experimental results show that our proposed framework has an accuracy of 81.2% and a recall of 81.7%. Previous implementations that included their operating speed reaches 25 FPS on shallow neural networks while this research's operating speed reaches 21 FPS on a very deep network. This shows that the performance of the research has competent accuracy and can operate in real-time safety-critical operations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call