Abstract

AbstractAn automated detection of aggressive and violent behaviour in videos has immense potential. It enables efficient online content filtering by restricting access to extreme content and also, when integrated with security systems, helps to monitor violence in surveillance videos. In this work, a convolutional neural network is combined with the proposed Spatial and Channel wise Attention‐based ConvLSTM encoder (SCan‐ConvLSTM). The proposed architecture performs an efficient spatiotemporal fusion of the features extracted from the video sequences containing fight scenes. In order to focus selectively on regions of utmost importance, this blended attention mechanism adjusts the weights of outputs in different locations and across different channels. This recurrent attention mechanism enhances the sequential refinement of activation maps and boosts the model performance. Finally, the experimental results have been presented that show the proposed architecture achieves superior results on the benchmark datasets (RWF‐2000, Violent‐flow, Hockey‐fights, and Movies).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call