Abstract
In recent years, human action recognition based on skeleton information has recently drawn increasing attention with published large-scale skeleton datasets. The most crucial factors for this task line in two aspects: the intra-frame representation for joint co-occurrences and the inter-frame representation for skeletons' temporal evolution. The most effective ways focus on spontaneous feature extraction by using deep learning. However, they ignore the structure information of skeleton joints and the correlation between two different skeleton joints for human action recognition. In this paper, we do not simply treat the joints position information as unordered points. Instead, we propose a novel data reorganizing strategy to represent the global and local structure information of human skeleton joints. Meanwhile, we also employ the data mirror to increase the relationship between skeleton joints. Based on this design, we proposed an end-to-end multi-dimensional CNN network (SRNet) to fully consider the spatial and temporal information to learn the feature extraction transform function. Specifically, in this CNN network, we employ different convolution kernels on different dimensions to learn skeleton representation to make the most of human structural information to generate robust features. Finally, we compare with other state-of-the-art on action recognition datasets like NTU RGB+D, PKU-MMD, SYSU, UT-Kinect, and HDM05. The experimental results also demonstrate the superiority of our method.
Highlights
Human action recognition plays a fundamental role in computer vision
We propose an end-to-end structured relevance feature learning network (SRNet), which can effectively utilize the different convolution kernels to save the spatial and temporal information to strengthen the role of data reorganizing strategy in training process
We propose a novel end-to-end structured relevance feature learning network (SRNet), which can effectively consider structure, temporal and correlation information of skeleton joints to guarantee the robust of extraction feature based on the first contribution;
Summary
Human action recognition plays a fundamental role in computer vision. Fast and reliable action recognition algorithm has become active demand in many areas such as video surveillance, human-robot interaction and so on [1], [27]. We propose an end-to-end structured relevance feature learning network (SRNet), which can effectively utilize the different convolution kernels to save the spatial and temporal information to strengthen the role of data reorganizing strategy in training process. We employ the data mirror to increase the relationship between skeleton joints by fully consider the structure information of humans This design can effectively save local and global information on human action in the training step;. We propose a novel end-to-end structured relevance feature learning network (SRNet), which can effectively consider structure, temporal and correlation information of skeleton joints to guarantee the robust of extraction feature based on the first contribution; The popular datasets are used to demonstrate the performance of the proposed method.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have