Abstract

Graph Convolutional Networks (GCNs) have attracted a lot of attention and shown remarkable performance for action recognition in recent years. For improving the recognition accuracy, how to build graph structure adaptively, select key frames and extract discriminative features are the key problems of this kind of method. In this work, we propose a novel Adaptive Attention Memory Graph Convolutional Networks (AAM-GCN) for human action recognition using skeleton data. We adopt GCN to adaptively model the spatial configuration of skeletons and employ Gated Recurrent Unit (GRU) to construct an attention-enhanced memory for capturing the temporal feature. With the memory module, our model can not only remember what happened in the past but also employ the information in the future using multi-bidirectional GRU layers. Furthermore, in order to extract discriminative temporal features, the attention mechanism is also employed to select key frames from the skeleton sequence. Extensive experiments on Kinetics, NTU RGB+D and HDM05 datasets show that the proposed network achieves better performance than some state-of-the-art methods.

Highlights

  • Research on human action recognition has become one of the most active issues in the computer vision area in recent years

  • The detailed process is described below: In Graph Convolutional Networks (GCNs)-based action recognition works [18,20], the dynamics of the human skeleton In GCN-based action recognition works [18,20], the dynamics of the human skeleton sequence for performing actions with N joints and T frames are denoted as a spatialsequence for performing actions with N joints and T frames are denoted as a spatialtemporal graph G = (V, E)

  • The results demonstrate that what happened in the past is helpful, and the future information is important for action recognition

Read more

Summary

Introduction

Research on human action recognition has become one of the most active issues in the computer vision area in recent years. It employs graph convolution to adaptively construct the spatial configuration in one frame and uses multiple bidirectional GRU layers to extract temporal information. The advantage of our proposed method could be described as: (1) The constructed adaptive graph can effectively capture the latent dependencies between arbitrary joints, including the ones which do not have a physical connection but have strong correlations in the actions, which is more suitable for real actions that need the collaboration of different body parts. We propose an AAM-GCN network to model dynamic skeletons for action recognition, which can construct the graph structure adaptively during the training process and explicitly explore the latent dependency among the joints. By constructing an attention-enhanced memory, AAM-GCN can selectively focus on key frames and capture both long-range discriminative temporal features in the past and the future. We conduct an ablation study to demonstrate the effectiveness of each individual part of our model

Related Works
Graph Convolutional Networks
Illustration
Adaptive
Attention
Model Architecture and Training Detail
Experiments
Datasets
Comparisons the State-of-the-Art
Methods
Visualization of the Actions
Effect of Adaptive Graph
Effect of Bidirectional
Compared with configuration
Effect of ASGC Concatenation
Other Parameters Evaluation
Findings
Conclusions and Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call