Abstract

Due to the constantly increasing demand for automatic tracking and recognition systems, there is a need for more proficient, intelligent and sustainable human activity tracking. The main purpose of this study is to develop an accurate and sustainable human action tracking system that is capable of error-free identification of human movements irrespective of the environment in which those actions are performed. Therefore, in this paper we propose a stereoscopic Human Action Recognition (HAR) system based on the fusion of RGB (red, green, blue) and depth sensors. These sensors give an extra depth of information which enables the three-dimensional (3D) tracking of each and every movement performed by humans. Human actions are tracked according to four features, namely, (1) geodesic distance; (2) 3D Cartesian-plane features; (3) joints Motion Capture (MOCAP) features and (4) way-points trajectory generation. In order to represent these features in an optimized form, Particle Swarm Optimization (PSO) is applied. After optimization, a neuro-fuzzy classifier is used for classification and recognition. Extensive experimentation is performed on three challenging datasets: A Nanyang Technological University (NTU) RGB+D dataset; a UoL (University of Lincoln) 3D social activity dataset and a Collective Activity Dataset (CAD). Evaluation experiments on the proposed system proved that a fusion of vision sensors along with our unique features is an efficient approach towards developing a robust HAR system, having achieved a mean accuracy of 93.5% with the NTU RGB+D dataset, 92.2% with the UoL dataset and 89.6% with the Collective Activity dataset. The developed system can play a significant role in many computer vision-based applications, such as intelligent homes, offices and hospitals, and surveillance systems.

Highlights

  • Vision-based Human–Computer Interaction (HCI) is a broad field covering many areas of computer vision, such as human action tracking, face recognition, gesture recognition, human–robot interaction and many more [1]

  • Human Action Recognition (HAR) can be precisely defined as tracking the motion of each and every observable body part involved in performing human actions and identifying the activities performed by humans [2]

  • We propose two full-body features, namely, geodesic distance and 3D Cartesian-plane features plus two skeletal joints-based features, namely, way-point trajectory generation and joints Motion Capture (MOCAP) features

Read more

Summary

Introduction

Vision-based Human–Computer Interaction (HCI) is a broad field covering many areas of computer vision, such as human action tracking, face recognition, gesture recognition, human–robot interaction and many more [1]. In our proposed methodology we focused on vision-based human motion analysis and representation for Human Action. HAR can be precisely defined as tracking the motion of each and every observable body part involved in performing human actions and identifying the activities performed by humans [2]. The shape and motion information of each way-point trajectory generated from subsets of skeletal joints are recorded with each changing frame. RGB silhouette of all three datasets is is achieved a background subtraction method extraction [69]. Pixels of the current each interaction class are subtracted from background frame [70]. Equation (1): subtracted from pixels of a background frame denoted by P[B], as given in Equation (1): P[ F (t)] = P[ I (t)] − P[ B] A frame difference technique used in through which current frames of subtraction method [69].

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call