Violence behavior recognition of two-cascade temporal shift module with attention mechanism

Qiming Liang,Kaikai Yang,Yong Li,Bowei Chen

doi:10.1117/1.jei.30.4.043009

Abstract

Violence behavior recognition is an important research scenario in behavior recognition and has broad application prospects in the field of network information review and intelligent security. Inspired by the long-short-term memory network, we estimate that temporal shift module (TSM) may have more room for improvement in the feature extraction ability of long-term information. In order to verify the above conjecture, we explored based on TSM. After many attempts, it was finally proposed to connect the two TSMs in a cascaded manner, which can expand the receptive field of the model. In addition, an efficient channel attention module was introduced at the front end of the network, which strengthened the model’s spatial feature extraction capabilities. At the same time due to behavior recognition prone to over-fitting, we extended and processed on the basis of some open-source datasets to form a larger violence dataset and solved the problem of over-fitting. The final experimental results show that the algorithm proposed can improve the model’s feature extraction ability of violent behavior in the space and temporal dimension and realize the recognition of violent behavior, which verified the above point of view.

Highlights

With the rapid popularization of mobile terminals, the Internet is uploading massive amounts of video data all the time, and these video data are likely to involve violent scenes, which will have an adverse impact on the health of the network environment
(3) Data collection and multimedia processing are performed on the existing open-source datasets, and an expanded violent behavior recognition dataset is established, which solves the problem of overfitting and verifies the performance of the algorithm in a larger sample condition
Liang et al.: Violence behavior recognition of two-cascade temporal shift module with attention mechanism without reducing the dimensionality, and local cross-channel interaction is realized through one-dimensional convolution, and it is activated by the nonlinear function sigmoid

Summary

Introduction

With the rapid popularization of mobile terminals, the Internet is uploading massive amounts of video data all the time, and these video data are likely to involve violent scenes, which will have an adverse impact on the health of the network environment. According to the different feature extraction models, the current common methods of behavior recognition based on deep learning can be divided into three categories: two-stream CNN model, temporal model, and spatiotemporal model. The long-term information acquired by TSM network during behavior recognition is limited, the network structure is too simple, and over-fitting is prone to occur in the process of feature learning. (1) A simple two-cascade TSM is proposed, which expands the receptive field of temporal dimensions and realizes the enhancement of long-term information extraction capabilities. (2) Introduce the efficient channel attention (ECA) module at the front end of the TSM network to improve the network’s feature extraction ability of spatial information to a certain extent and reduce the impact of overfitting on network performance. (3) Data collection and multimedia processing are performed on the existing open-source datasets, and an expanded violent behavior recognition dataset is established, which solves the problem of overfitting and verifies the performance of the algorithm in a larger sample condition

Temporal Shift Module

Efficient Channel Attention Module

Intuition

Two-Cascade TSM Residual Module

Dataset

Parameter Configuration

Results

Discussion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of electronic imaging	Publication Date: Jul 21, 2021
Citations: 15	License type: cc-by

R Discovery Prime

R Discovery Prime

Violence behavior recognition of two-cascade temporal shift module with attention mechanism

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of electronic imaging

Lead the way for us

Similar Papers

Skeleton Action Recognition Based on Temporal Gated Unit and Adaptive Graph Convolution
Qilin Zhu ... Kaixuan Wang
Electronics | VOL. 11
Qilin Zhu, et. al.Qilin Zhu ... Kaixuan Wang
19 Sep 2022
Electronics | VOL. 11

A collaborative approach to image segmentation and behavior recognition from image sequences

-

01 Jan 2008
01 Jan 2008

Video behavior recognition based on actional-structural graph convolution and temporal extension module
Hui Xu ... Hui Sun
Electronic Research Archive | VOL. 30
Hui Xu, et. al.Hui Xu ... Hui Sun
01 Jan 2021
Electronic Research Archive | VOL. 30

Mobile Application Behavior Recognition Based on Dual-Domain Attention and Meta-learning
Wenjun Zhang
-
Wenjun ZhangWenjun Zhang
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Violence behavior recognition of two-cascade temporal shift module with attention mechanism

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of electronic imaging