Abstract

Human action recognition based on skeleton has played a key role in various computer vision-related applications, such as smart surveillance, human-computer interaction, and medical rehabilitation. However, due to various viewing angles, diverse body sizes, and occasional noisy data, etc., this remains a challenging task. The existing deep learning-based methods require long time to train the models and may fail to provide an interpretable descriptor to code the temporal-spatial feature of the skeleton sequence. In this paper, a key-segment descriptor and a temporal step matrix model are proposed to semantically present the temporal-spatial skeleton data. First, a skeleton normalization is developed to make the skeleton sequence robust to the absolute body size and initial body orientation. Second, the normalized skeleton data is divided into skeleton segments, which are treated as the action units, combining 3D skeleton pose and the motion. Each skeleton sequence is coded as a meaningful and characteristic key segment sequence based on the key segment dictionary formed by the segments from all the training samples. Third, the temporal structure of the key segment sequence is coded into a step matrix by the proposed temporal step matrix model, and the multiscale temporal information is stored in step matrices with various steps. Experimental results on three challenging datasets demonstrate that the proposed method outperforms all the hand-crafted methods and it is comparable to recent deep learning-based methods.

Highlights

  • Human action recognition has become an important research topic in the field of computer vision, and has attracted considerable interest in the past few decades [1]–[6] due to its wide range of applications in smart surveillance, human-computer interaction and medical rehabilitation

  • We propose an effective yet simple skeleton sequence representation based on a sequence of atomic action units, namely, skeleton segments, which consist of several consecutive skeleton frames whose spatial variation is relatively small

  • The skeleton sequence is automatically divided into skeleton segments according to the segmentation scheme

Read more

Summary

INTRODUCTION

Human action recognition has become an important research topic in the field of computer vision, and has attracted considerable interest in the past few decades [1]–[6] due to its wide range of applications in smart surveillance, human-computer interaction and medical rehabilitation. R. Li et al.: Skeleton-Based Action Recognition With Key-Segment Descriptor and Temporal Step Matrix Model. The recent emerging deep learning-based methods [9], [12], [13] code the spatio-temporal information simultaneously and work well in action recognition, but these approaches have no interpretable physical meaning. Each skeleton sequence is divided into multiple skeleton segments based on the designed segmentation scheme as shown in Fig. (b) This is just like that human can distinguish and perform different actions as long as some key static images sequence with the direction of movement, which can be regarded as the feature of a segment, were given. The main body of the proposed method is described in detail: skeleton normalization, key-segment based descriptor, and temporal step matrix model.

RELATED WORKS
EXPERIMENTS AND DISCUSSIONS
Findings
CONCLUSION AND FUTURE WORKS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call