Abstract

Human skeleton contains significant information about actions, therefore, it is quite intuitive to incorporate skeletons in human action recognition. Human skeleton resembles to a graph where body joints and bones mimic to graph nodes and edges. This resemblance of human skeleton to graph structure is the main motivation to apply graph convolutional neural network for human action recognition. Results show that the discriminant contribution of different joints is not equal for different actions. Therefore, we propose to use attention-joints that correspond to joints significantly contributing to the specific actions. Features corresponding to only these attention-joints are computed and assigned as node features of the graph. In our method, node features (also termed as attention-joint features) include the i) distances of attention-joints from the center-of-gravity of human body, ii) distances between adjacent attention-joints and iii) joints flow features. The proposed method gives a simple but more efficient representation of skeleton sequences by concatenating more relative distances and relative coordinates to other joints. The proposed methodology has been evaluated on single image Stanford 40-Actions dataset, as well as on temporal skeleton-based action recognition PKU-MDD and NTU-RGBD datasets. Results show that this framework outperforms existing state-of-the-art methods.

Highlights

  • Human action recognition in videos has numerous practical applications such as video surveillance, video content analysis, health-care and entertainment

  • To Graph Convolutional Neural Networks (GNN) that CNN is applicable on non-euclidean domains such as graphs of arbitrary nodes and edges

  • The above-mentioned approaches for action recognition are based on convolutional neural network, while we first time addressed single image action recognition problem using attention-joints based graph CNN, and resulted in stateof-the-art performance of 84.8%

Read more

Summary

INTRODUCTION

Human action recognition in videos has numerous practical applications such as video surveillance, video content analysis, health-care and entertainment. Human skeletons can be represented in the form of graphs, the direct application of Convolutional Neural Networks (CNN) on human skeletons is not so intuitive. To Graph Convolutional Neural Networks (GNN) that CNN is applicable on non-euclidean domains such as graphs of arbitrary nodes and edges. Graph convolutional networks known as geometric CNNs, can be applied for node classification and link prediction in non-euclidean space such as social networking, molecular biology and brain-signal processing. Graph convolution networks extract high-level features from graphs and are suitable for application such as human action recognition via skeletons with body joints corresponding to nodes and bones between joints corresponding to edges, respectively. 3) A new attention-joints graph convolutional neural network is designed for skeleton-based action recognition, which achieves state-of-the-art performance on three public benchmarks.

RELATED WORK
ATTENTION NETWORK
ATTENTION JOINTS ENCODING
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call