Abstract
In recent years, violence detection has gradually turned into an important research area in computer vision, and have proposed many models with high accuracy. However, the unsatisfactory generalization ability of these methods over different datasets. In this paper, the authors propose a violence detection method based on C3D two-stream network for spatiotemporal features. Firstly, the authors preprocess the video data of RGB stream and optical stream respectively. Secondly, the authors feed the data into two C3D networks to extract features from the RGB flow and the optical flow respectively. Third, the authors fuse the features extracted by the two networks to obtain a final prediction result. To testify the performance of the proposed model, four different datasets (two public datasets and two self-built datasets) are selected in this paper. The experimental results show that our model has good generalization ability compared to state-of-the-art methods, since it not only has good ability on large-scale datasets, but also performs well on small-scale datasets.
Highlights
With the development and progress of society, a harmonious and stable social security becomes crucial
The authors propose a violence detection method based on C3D two-stream network for spatiotemporal features
Using the degree of motion and sound combined with the characteristics of blood and flame Features are extracted from RGB images and optical flow images using Local Histogram of Oriented Gradient (LHOG) and Local Histogram of Optical Flow (LHOF) Violent flow descriptors (ViF) Oriented violent flow descriptor (OViF) Improved dense trajectories Statistical characteristics of the optical flow (SCOF) Integrated audio and video features Two-Stream network Based on Two-Stream network, but using VGG-16 network to replace Convolutional Neural Network (CNN) Based on Two-Stream network, but classified with support vector machine (SVM) 3D Convolutional network (C3D) C3D-based violence detection
Summary
With the development and progress of society, a harmonious and stable social security becomes crucial. A number of models have been proposed in the last few decades (Ramzan, Abid, et al, 2019), such as two-stream network (Simonyan, Zisserman, 2014) and 3D Convolutional network (C3D) (Tran, Bourdev, et al, 2015; Ji, Xu, et al, 2012), etc Each of these methods extracts the temporal and spatial information of the image in their own way, and yields its own characteristics. Inspired by the excellent performance of two-stream network and C3D network in the field of action recognition, the authors tackle the challenges mentioned above by proposing a C3D-based two-stream network violence detection model. The existing violence detection datasets suffer from the problem of insufficient data quantity and single data scene To tackle these problems, the authors collect violent videos from websites and process them.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Cognitive Informatics and Natural Intelligence
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.