Abstract

Skeleton-based human action recognition has been a popular research field during the past few years. With the help of cameras equipping deep sensors, such as the Kinect, human action can be represented by a sequence of human skeleton data. Inspired by the skeleton descriptors based on Lie group, a spatial–temporal skeleton transformation descriptor (ST-STD) is proposed in this paper. The ST-STD describes the relative transformations of skeletons, including the rotation and translation during movement. It gives a comprehensive view of the skeleton in both spatial and temporal domain for each frame. To capture the temporal connections in the skeleton sequence, a denoising sparse long short term memory (DS-LSTM) network is proposed in this paper. The DS-LSTM is designed to deal with two problems in action recognition. First, to decrease the intra-class diversity, the spatial–temporal auto-encoder (STAE) is proposed in this paper to generate representations with higher abstractness. The denoising constraint and the sparsity constraint are applied on both spatial and temporal domain to enhance the robustness and to reduce action misalignment. Second, to model the action sequence, a three-layer LSTM structure is trained with STAE representations for temporal modeling and classification. The experiments are carried out on four popular datasets. The results show that our approach performs better than several existing skeleton-based action recognition methods, which prove the effectiveness of our method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call