End-to-end learning of deep convolutional neural network for 3D human action recognition

Chao Li Chao Li,Wenqian Lin Wenqian Lin,Xin Min Xin Min,Xianfu Zhang Xianfu Zhang,Binling Nie Binling Nie,Shouqian Sun Shouqian Sun

doi:10.1109/icmew.2017.8026281

Abstract

Recently, skeleton-based human action recognition has been receiving significant attention from various research communities due to its robustness, succinctness, and view-invariant representation. Most of the existing skeleton-based methods use either well-designed classifiers with hand-crafted features or current neural network (RNN) to recognize human actions. In this paper, inspired by the deep convolutional neural network's breakthroughs in the image domain, we transform a skeleton sequence into an image and perform end-to-end learning of deep convolutional neural network (CNN). The skeleton sequence based image contains spatial temporal information. Our proposed method is tested on the NTU RGB+D dataset which is so far the largest skeleton-based human action dataset, and achieves the state-of-the-art performance for both the cross-view and cross-subject evaluations.

Full Text