Abstract
The human skeleton can be considered as a tree system of rigid bodies connected by bone joints. In recent researches, substantial progress has been made in both theories and experiments on skeleton-based action recognition. However, it is challenging to accurately represent the skeleton and precisely eliminate noisy skeletons from the action sequence. This paper proposes a novel skeletal representation, which is composed of two subfeatures to recognize human action: static features and dynamic features. First, to avoid scale variations from subject to subject, the orientations of the rigid bodies in a skeleton are employed to capture the scale-invariant spatial information of the skeleton. The static feature of the skeleton is defined as a combination of the orientations. Unlike previous orientation-based representations, the orientation of a rigid body in the skeleton is defined as the rotations between the rigid body and the coordinate axes in three-dimensional space. Each rotation is mapped to the special orthogonal group SO(3). Next, the rigid-body motions between the skeleton and its previous skeletons are utilized to capture the temporal information of the skeleton. The dynamic feature of the skeleton is defined as a combination of the motions. Similarly, the motions are represented as points in the special Euclidean group SE(3). Therefore, the proposed skeleton representation lies in the Lie group (SE(3)×⋯×SE(3), SO(3)×⋯×SO(3)), which is a manifold. Using the proposed representation, an action can be considered as a series of points in this Lie group. Then, to recognize human action more accurately, a new pattern-growth algorithm named MinP-PrefixSpan is proposed to mine the key-skeleton-patterns from the training dataset. Because the algorithm reduces the number of new patterns in each growth step, it is more efficient than the PrefixSpan algorithm. Finally, the key-skeleton-patterns are used to discover the most informative skeleton sequences of each action (skeleton sequence). Our approach achieves accuracies of 94.70%, 98.87%, and 95.01% on three action datasets, outperforming other relative action recognition approaches, including LieNet, Lie group, Grassmann manifold, and Graph-based model.
Highlights
Human action recognition is currently the most dynamic research topic in the field of computer vision, owing to its applications in intelligent surveillance, video games, robotics, and other fields
Because the human skeleton can generally be regarded as an articulated system of rigid segments, which are connected by joints, human action can be viewed as a continuous evolution of the spatial configuration, which is constructed by these rigid segments [2]
If human skeleton sequences can be accurately extracted from RGB videos, action recognition can be performed by classifying these sequences
Summary
Human action recognition is currently the most dynamic research topic in the field of computer vision, owing to its applications in intelligent surveillance, video games, robotics, and other fields. If human skeleton sequences can be accurately extracted from RGB videos, action recognition can be performed by classifying these sequences. Using the proposed skeleton representation, a human action (skeleton sequence) can be represented as points in the Lie group. It is typically a very complicated task to classify human actions represented by a Lie group directly. (2) Traditional approaches based on Lie groups [5, 13, 14] only consider the spatial information of a skeleton but ignore the temporal information between different skeletons. In this study, based on the PrefixSpan algorithm [15] in data mining, a new pattern-growth algorithm is proposed to mine the keyskeleton-patterns of each action class, and the key-skeletonpatterns are used to eliminate noisy skeletons
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.