Abstract

In this paper, we propose a new method for detecting abnormal human behavior based on skeleton features using self-attention augment graph convolution. The skeleton data have been proved to be robust to the complex background, illumination changes, and dynamic camera scenes and are naturally constructed as a graph in non-Euclidean space. Particularly, the establishment of spatial temporal graph convolutional networks (ST-GCN) can effectively learn the spatio-temporal relationships of Non-Euclidean Structure Data. However, it only operates on local neighborhood nodes and thereby lacks global information. We propose a novel spatial temporal self-attention augmented graph convolutional networks (SAA-Graph) by combining improved spatial graph convolution operator with a modified transformer self-attention operator to capture both local and global information of the joints. The spatial self-attention augmented module is used to understand the intra-frame relationships between human body parts. As far as we know, we are the first group to utilize self-attention for video anomaly detection tasks by enhancing spatial temporal graph convolution. Moreover, to validate the proposed model, we performed extensive experiments on two large-scale publicly standard datasets (i.e., ShanghaiTech Campus and CUHK Avenue datasets) which reveal the state-of-art performance for our proposed approach when compared to existing skeleton-based methods and graph convolution methods.

Highlights

  • Video anomaly detection is a highly challenging task in unsupervised video analysis

  • The key contributions of this work are summarized in this paper as follows: (1) We propose a novel spatial temporal self-attention augmented graph convolutional clustering networks for skeleton-based video anomaly detection tasks by employing the spatial temporal self-attention augmented graph convolutional autoencoder to extract the relevant features and embedded clustering; (2) We design a new spatial self-attention enhancement graph convolution operator to understand the intra-frame interaction between different body parts and capture the local and global features of a skeleton in the frame; (3) Our model achieves state-of-the-art area under ROC curve (AUC) of 0.789 for the ShanghaiTech Campus anomaly detection datasets and exhibits excellent performance metrics for CUHK Avenue datasets

  • We proved that the SAA-graph convolution baseline (Graph) can achieve a more flexible and dynamic representation between skeletons while overcoming the locality of graph convolution

Read more

Summary

Introduction

Video anomaly detection is a highly challenging task in unsupervised video analysis. In recent years, surveillance video anomaly detection has gained widespread attention owing to its applications in public security, social security management, and the rising trends in deep learning and computer vision. We use self-attention to solve the locality of the graph convolution operator by capturing the global information in the skeleton data. The key contributions of this work are summarized in this paper as follows: (1) We propose a novel spatial temporal self-attention augmented graph convolutional clustering networks for skeleton-based video anomaly detection tasks by employing the spatial temporal self-attention augmented graph convolutional autoencoder to extract the relevant features and embedded clustering; (2) We design a new spatial self-attention enhancement graph convolution operator to understand the intra-frame interaction between different body parts and capture the local and global features of a skeleton in the frame; (3) Our model achieves state-of-the-art AUC of 0.789 for the ShanghaiTech Campus anomaly detection datasets and exhibits excellent performance metrics for CUHK Avenue datasets. Pose graphs and a Dirichlet process mixture for video anomaly detection with a new coarse-grained setting for exploring broader aspects of video anomaly detection

Skeleton-Based Action Recognition
Transformer
Graph Convolutional Neural Networks
Proposed Method
Spatiotemporal Graph Connection Configuration for Skeleton
Spatial Graph Convolution
Deep Embedded Clustering
Normality Score
Experiment
Dataset
Implementation Details
Comparison with State-of-the-Art Methods
Method
Ablation Study
The Visualization of SAA-Graph
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call