Abstract
The research on detecting violent behavior in videos has made good progress, which provides good support for monitoring abnormal videos spread in the network, so as to achieve the effect of purifying the network space environment. A large number of current violence detection models have achieved good performance in experimental environments, but their generalization ability is insufficient. Violent behavior often occurs in a variety of scenarios, automatic detection of violent behavior requires a model with strong generalization. In this paper, a crowd violence behavior detection model with good generalization ability based on human contour and dynamic characteristics was designed. The model generalization ability is improved by focusing on the human features in the video and using the human dynamic features obtained from adjacent frames. In our model, a 3D-CNN framework was used to extract spatial features of the input feature map, and LSTM was used to fuse the temporal feature, we call this model HD-Net. Through multiple contrast experiments, the generalization ability of HD-Net is tested on three datasets: RLVS, Hockey and violent flow. Comparing with other classical violence detection models, the good generalization ability of the model is verified.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.