Sequential Deep Learning for Action Recognition with Synthetic Multi-view Data from Depth Maps

Bin Liang,Lihong Zheng,Xinying Li

doi:10.1007/978-981-13-6661-1_28

Abstract

Recurrent neural network (RNN) has proven successful recently in action recognition. However, depth sequences are of high dimensionality and contain rich human dynamics, which makes traditional RNNs difficult to capture complex action information. This paper addresses the problem of human action recognition from sequences of depth maps using sequential deep learning. The proposed method first synthesizes multi-view depth sequences by rotating 3D point clouds from depth maps. Each depth sequence is then split into short-term temporal segments. For each segment, a multi-view depth motion template (MVDMT), which compresses the segment to a motion template, is constructed for short-term multi-view action representation. The MVDMT effectively characterizes the multi-view appearance and motion patterns within a short-term duration. Convolutional Neural Network (CNN) models are leveraged to extract features from MVDMT, and a CNN-RNN network is subsequently employed to learn an effective representation for sequential patterns of the multi-view depth sequence. The proposed multi-view sequential deep learning framework can simultaneously capture spatial-temporal appearance and motion features in the depth sequence. The proposed method has been evaluated on the MSR Action3D and MSR Action Pairs datasets, achieving promising results compared with the state-of-the-art methods based on depth data.

Full Text