Sequential attention mechanism for weakly supervised video anomaly detection

Waseem Ullah,Fath U Min Ullah,Zulfiqar Ahmad Khan,Sung Wook Baik

doi:10.1016/j.eswa.2023.120599

Waseem Ullah, Fath U Min Ullah + Show 2 more

https://doi.org/10.1016/j.eswa.2023.120599

Copy DOI

Abstract

Surveillance cameras are installed across various sectors of a smart city in order to capture ongoing events for monitoring purposes. The analysis of these surveillance videos is an important research topic that involves activity recognition, object detection, anomaly recognition, and other problems. However, anomaly recognition is the most common task in a smart city, and has received significant attention with the aim of ensuring public safety and security. Many works have been published in this field, but these schemes have not been able to provide the desired detection outcomes. Mainstream anomaly recognition methods are heavily dependent on strong supervision to achieve satisfactory performance, which is time-consuming and impractical. With a particular focus on this problem, this article presents a deep convolution neural network (CNN)-based novel anomaly recognition model, in which deep features are extracted from surveillance video frames. These features are forwarded to the proposed temporal convolution network (TCN) that includes a multi-head attention module to enable it to recognise anomalies from these videos. The multi-head temporal attention mechanism enables the model to obtain more key temporal information about the complex surveillance environment. Experiments conducted on standard datasets and a comparison with state-of-the-art approaches demonstrate the effectiveness and superiority of the proposed framework, which achieves increases in accuracy of 0.9%, 1.9%, 0.65%, 0.27%, and 1.5% on the UCF-Crime2local, LAD-2000, RWF-2000, RLVS, and Crowd Violence datasets, respectively. These outcomes indicate the suitability of our method for deployment in real-time surveillance schemes.

Full Text