Abstract

Recently, skeleton-based action recognition has become a very important topic in the field of computer vision. It is a challenging task to accurately build a human action model and precisely distinguish similar human actions. In this paper, an action (skeleton sequence) is represented as a third-order nonnegative tensor time series to capture the original spatiotemporal information of the action. As a linear dynamical system (LDS) is an efficient tool for encoding the spatiotemporal data in various disciplines, this paper proposes a nonnegative tensor-based LDS (nLDS) to model the third-order nonnegative tensor time series. Nonnegative Tucker decomposition (NTD) is utilized to estimate the parameters of the nLDS model. These parameters are used to build extended observability sequence O∞T for the action, which implies that O∞T can be considered as the feature descriptor of the action. To avoid the limitations introduced by approximating O∞T with a finite-order matrix, we represent an action as a point on infinite Grassmann manifold comprising the orthonormalized extended observability sequences. The classification task can be performed by dictionary learning and sparse coding on the infinite Grassmann manifold. The experimental results on the MSR-Action3D, UTKinect-Action, and G3D-Gaming datasets demonstrate that the proposed approach achieves a better performance in comparison with the state-of-the-art methods.

Highlights

  • Human action recognition based on spatiotemporal data has been one of the most prominent research topics owing to its applications in human-computer interfaces [1], gaming [2], and surveillance systems [3]

  • In order to verify the effectiveness of infinite Grassmann manifold, we use the dictionary learning and sparse coding on Grassmann manifold to classify the nonnegative tensor-based actions represented as points on a Grassmann manifold

  • Nonnegative Tucker decomposition (NTD) is employed to estimate the parameters of the nonnegative tensor-based LDS (nLDS) model

Read more

Summary

Introduction

Human action recognition based on spatiotemporal data has been one of the most prominent research topics owing to its applications in human-computer interfaces [1], gaming [2], and surveillance systems [3]. Over the past few decades, numerous methods have been proposed for recognizing human actions from monocular RGB videos [4]. In the past few decades several significant research studies have been conducted, the method to accurately recognize human actions from RGB videos still remains a challenging problem. As a human skeleton can be viewed as an articulated system of rigid bodies connected by bone joints, a human action can be described as the spatiotemporal evolution of a series of skeletons. If human skeleton sequences can be accurately extracted from RGB videos, it is possible to perform action recognition by classifying the skeleton sequences. Skeleton-based action recognition has once again become an active area of research

Methods
Findings
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call