Abstract

Three-dimensional (3D) human skeleton extraction is a powerful tool for activity acquirement and analyses, spawning a variety of applications on somatosensory control, virtual reality and many prospering fields. However, the 3D human skeletonization relies heavily on RGB-Depth (RGB-D) cameras, expensive wearable sensors and specific lightening conditions, resulting in great limitation of its outdoor applications. This paper presents a novel 3D human skeleton extraction method designed for the monocular camera large scale outdoor scenarios. The proposed algorithm aggregates spatial–temporal discrete joint positions extracted from human shadow on the ground. Firstly, the projected silhouette information is recovered from human shadow on the ground for each frame, followed by the extraction of two-dimensional (2D) joint projected positions. Then extracted 2D joint positions are categorized into different sets according to activity silhouette categories. Finally, spatial–temporal integration of same-category 2D joint positions is carried out to generate 3D human skeletons. The proposed method proves accurate and efficient in outdoor human skeletonization application based on several comparisons with the traditional RGB-D method. Finally, the application of the proposed method to RGB-D skeletonization enhancement is discussed.

Highlights

  • The development of three-dimensional (3D) human skeleton extraction contributes enormously to prospering fields like virtual reality and somatosensory human–computer interaction

  • This paper mainly focuses on the extraction and aggregation of the extra silhouette information from spatial–temporal discrete human shadows on the ground, aiming to perform 3D human skeletonization with a monocular camera in outdoor scenarios

  • For each skeleton extracted from a effective frame, joint positions are normalized relative to the hip center, avoiding deviation introduced by different shot distances

Read more

Summary

Introduction

The development of three-dimensional (3D) human skeleton extraction contributes enormously to prospering fields like virtual reality and somatosensory human–computer interaction. The proposed SSSE method deploys shadow information extraction algorithm to the field of human skeletonization [9,10,11]. Based on the proposed SSSE method, six 3D joint positions in the human skeleton can be precisely extracted in outdoor scenarios with a normal monocular camera. Compared with current indoor 3D human skeleton extraction methods based on RGB-D cameras like Kinect, the proposed SSSE method reduces constraints on input device choice and application environment setup. This paper mainly focuses on the extraction and aggregation of the extra silhouette information from spatial–temporal discrete human shadows on the ground, aiming to perform 3D human skeletonization with a monocular camera in outdoor scenarios. The proposed SSSE method deploys the aggregation of temporal–spatial discrete two-dimensional (2D) shadow information in a 3D human skeletonization procedure.

Basic Theory
Skeleton Simulation in Multi-Light-Source Scenarios
Silhouette Information Extraction
Skeleton Simulation in Single-Light-Source Scenario
Theoretic Proof of the Extension Solution in a Single Light Source Scenario
Temporal–Spatial Aggregation Method
Proposed Method
Data Source Description and Experimental Settings
Effective Range and Precision Analyses
Discussion
Reliable Joint Percentage Enhancement
Computational Cost Evaluation
Conclusions
Methods
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call