Abstract
AbstractAn automated detection of aggressive and violent behaviour in videos has immense potential. It enables efficient online content filtering by restricting access to extreme content and also, when integrated with security systems, helps to monitor violence in surveillance videos. In this work, a convolutional neural network is combined with the proposed Spatial and Channel wise Attention‐based ConvLSTM encoder (SCan‐ConvLSTM). The proposed architecture performs an efficient spatiotemporal fusion of the features extracted from the video sequences containing fight scenes. In order to focus selectively on regions of utmost importance, this blended attention mechanism adjusts the weights of outputs in different locations and across different channels. This recurrent attention mechanism enhances the sequential refinement of activation maps and boosts the model performance. Finally, the experimental results have been presented that show the proposed architecture achieves superior results on the benchmark datasets (RWF‐2000, Violent‐flow, Hockey‐fights, and Movies).
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.