Abstract

Violence behavior recognition is an important research scenario in behavior recognition and has broad application prospects in the field of network information review and intelligent security. Inspired by the long-short-term memory network, we estimate that temporal shift module (TSM) may have more room for improvement in the feature extraction ability of long-term information. In order to verify the above conjecture, we explored based on TSM. After many attempts, it was finally proposed to connect the two TSMs in a cascaded manner, which can expand the receptive field of the model. In addition, an efficient channel attention module was introduced at the front end of the network, which strengthened the model’s spatial feature extraction capabilities. At the same time due to behavior recognition prone to over-fitting, we extended and processed on the basis of some open-source datasets to form a larger violence dataset and solved the problem of over-fitting. The final experimental results show that the algorithm proposed can improve the model’s feature extraction ability of violent behavior in the space and temporal dimension and realize the recognition of violent behavior, which verified the above point of view.

Highlights

  • With the rapid popularization of mobile terminals, the Internet is uploading massive amounts of video data all the time, and these video data are likely to involve violent scenes, which will have an adverse impact on the health of the network environment

  • (3) Data collection and multimedia processing are performed on the existing open-source datasets, and an expanded violent behavior recognition dataset is established, which solves the problem of overfitting and verifies the performance of the algorithm in a larger sample condition

  • Liang et al.: Violence behavior recognition of two-cascade temporal shift module with attention mechanism without reducing the dimensionality, and local cross-channel interaction is realized through one-dimensional convolution, and it is activated by the nonlinear function sigmoid

Read more

Summary

Introduction

With the rapid popularization of mobile terminals, the Internet is uploading massive amounts of video data all the time, and these video data are likely to involve violent scenes, which will have an adverse impact on the health of the network environment. According to the different feature extraction models, the current common methods of behavior recognition based on deep learning can be divided into three categories: two-stream CNN model, temporal model, and spatiotemporal model. The long-term information acquired by TSM network during behavior recognition is limited, the network structure is too simple, and over-fitting is prone to occur in the process of feature learning. (1) A simple two-cascade TSM is proposed, which expands the receptive field of temporal dimensions and realizes the enhancement of long-term information extraction capabilities. (2) Introduce the efficient channel attention (ECA) module at the front end of the TSM network to improve the network’s feature extraction ability of spatial information to a certain extent and reduce the impact of overfitting on network performance. (3) Data collection and multimedia processing are performed on the existing open-source datasets, and an expanded violent behavior recognition dataset is established, which solves the problem of overfitting and verifies the performance of the algorithm in a larger sample condition

Temporal Shift Module
Efficient Channel Attention Module
Intuition
Two-Cascade TSM Residual Module
Dataset
Parameter Configuration
Results
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call