Abstract
Skeleton-based action recognition is a widely used task in action related research because of its clear features and the invariance of human appearances and illumination. Furthermore, it can also effectively improve the robustness of the action recognition. Graph convolutional networks have been implemented on those skeletal data to recognize actions. Recent studies have shown that the graph convolutional neural network works well in the action recognition task using spatial and temporal features of skeleton data. The prevalent methods to extract the spatial and temporal features purely rely on a deep network to learn from primitive 3D position. In this paper, we propose a novel action recognition method applying high-order spatial and temporal features from skeleton data, such as velocity features, acceleration features, and relative distance between 3D joints. Meanwhile, a method of multi-stream feature fusion is adopted to fuse these high-order features we proposed. Extensive experiments on Two large and challenging datasets, NTU-RGBD and NTU-RGBD-120, indicate that our model achieves the state-of-the-art performance.
Highlights
Division of Computer Science and Engineering, CAIIT, Jeonbuk National University, Jeonju 54896, Korea; Current address: Shanghai University of Engineering Science, No 333 Longteng Road, Songjiang District, Shanghai 201620, China
We propose several spatial and temporal features which are more effective for skeleton-based action recognition
By blending these high-order features, the deep network highlights the spatial changes and temporal changes of the 3D joints, which are crucial for action recognition
Summary
Action recognition is a very important task in machine vision, and it can be applied to many scenes, such as automatic driving, security, human-computer interaction, and others. Some graph-based neural networks [6,7,8,9,10] are dedicated to learning both spatial and temporal features for action recognition. They focus on capturing the hidden relationships among vertices in space. The velocity, acceleration, and relative distance information of each vertex can be extracted from the skeleton-based data. We propose several high-order spatial and temporal features that are important for skeletal analysis: velocity, acceleration, and relative distance between 3D joints. The high-order motion features, such as velocity and acceleration of the joints, are nontrivial to be learned from the deep network. Our method is evaluated on the NTU-RGBD and NTU-RGBD-120 dataset, which achieves state-of-the-art performance on action detection
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have