Spatio-temporal SRU with global context-aware attention for 3D human action recognition

Qingshan She,Haitao Gan,Yingle Fan,Gaoyuan Mu

doi:10.1007/s11042-019-08587-w

Abstract

3D action recognition has attracted much attention in machine learning fields in recent years, and recurrent neural networks (RNNs) have been widely used for 3D action recognition due to their efficiency in processing sequential data. However, in order to achieve good performance, traditional RNN architectures are usually time-consuming for the training and inference process. To address the problem, a global context-aware attention spatio-temporal SRU (GCA-ST-SRU) method is proposed and applied for 3D action recognition in this paper, through extending the original simple recurrent unit (SRU) algorithm to joint spatio-temporal domain with an attention mechanism. First, deep neural networks were employed to learn the features of skeleton joints at each frame, and then these new high-level feature sequences were classified using the GCA-ST-SRU method which can learn the spatio-temporal dependence between different joints in the same frame and pay more attention to informative joints. Extensive experiments were conducted on the UT-Kinect and SBU-Kinect Interaction datasets to evaluate the effectiveness of the proposed method. Compared with several existing algorithms including SRU, long short-term memory (LSTM), spatio-temporal LSTM (ST-LSTM) and global context-aware attention LSTM (GCA-LSTM), our method has exhibited better performance in classification accuracy and computational efficiency. The experimental results demonstrate the effectiveness and practicability of our algorithm. Compared to the methods with similar performance, our algorithms can reduce training time and improve the inference speed, and thus it achieves a balance between speed and accuracy.

Full Text