Skeleton based Human Action Recognition using a Structured-Tree Neural Network

Andrew Ware,Misha Karim,Muhammad Junaid Khalid,Muhammad Sajid Khan,Nisar Bahoo

doi:10.24018/ejers.2020.5.8.2004

Andrew Ware, Misha Karim + Show 3 more

Open Access

https://doi.org/10.24018/ejers.2020.5.8.2004

Copy DOI

Abstract

The ability for automated technologies to correctly identify a human’s actions provides considerable scope for systems that make use of human-machine interaction. Thus, automatic3D Human Action Recognition is an area that has seen significant research effort. In work described here, a human’s everyday 3D actions recorded in the NTU RGB+D dataset are identified using a novel structured-tree neural network. The nodes of the tree represent the skeleton joints, with the spine joint being represented by the root. The connection between a child node and its parent is known as the incoming edge while the reciprocal connection is known as the outgoing edge. The uses of tree structure lead to a system that intuitively maps to human movements. The classifier uses the change in displacement of joints and change in the angles between incoming and outgoing edges as features for classification of the actions performed

Highlights

Human Action Recognition (HAR) is a dynamic and challenging task in which an individual’s physical actions are identified
The HAR process involves several steps starting from harvesting human motion information from raw sensor data, through to correctly identifying the actions performed
A point cloud is a collection of 3D positions (x, y, z) that represents the surface of an object or plane

Summary

Introduction

Human Action Recognition (HAR) is a dynamic and challenging task in which an individual’s physical actions are identified. The HAR process involves several steps starting from harvesting human motion information from raw sensor data, through to correctly identifying the actions performed. MIT researchers were one of the earliest to develop such a technique, using a bottom-up approach to extrapolate 3D models of point clouds enabling computer vision systems capable of automating the capture and processing of images [1] in applications such as surveillance. Point Cloud Systems are the most common HAR technique, on account of their ability to facilitate fast detection and identification of actions recorded using 3D video. A point cloud is a collection of 3D positions (x, y, z) that represents the surface of an object or plane. Point clouds can be used for calculations directly in or on the object, e.g. distances, diameters, curvatures or cubature

Methods

Results

Conclusion