Abstract

We present a new method for capturing human motion over 360 degrees by the fusion of multi-view RGB-D video data from Kinect sensors. Our method is able to reconstruct the unified human motion from fused RGB-D and skeletal data over 360 degrees and create a unified skeletal animation. We make use of all three streams: RGB, depth and skeleton, along with the joint tracking confidence state from Microsoft Kinect SDK to find the correctly oriented skeletons and merge them together to get a uniform measurement of human motion resulting in a unified skeletal animation. We quantitatively validate the goodness of the unified motion using two evaluation techniques. Our method is easy to implement and provides a simple solution of measuring and reconstructing a 360 degree plausible unified human motion that would not be possible to capture with a single Kinect due to tracking failures, self-occlusions, limited field of view and subject orientation.

Highlights

  • The field of marker-less motion capture and 3D or free-viewpoint video has received a lot of interest in the past decade

  • Face detection works well in more than 90% of the frames but it can fail if the face is occluded, for example, in the boxing sequence. We solve this issue in a pre-processing step by analyzing the sequence and if a couple of frames are missing the face, we look at the frames before and after the missing frames under the assumption that the frames were skipped due to occlusion

  • We show that our method is able to reconstruct the human motion over 360 degrees by fusing multiple RGB-D sensors and reconstruct a unified skeletal animation in a plausible way that would not be possible with a single Kinect

Read more

Summary

Introduction

The field of marker-less motion capture and 3D or free-viewpoint video has received a lot of interest in the past decade. For every frame we need to identify which cameras can be used for reconstructing the unified skeleton.As the depth data, or the skeleton and joins tracking states are not helpful in finding the correct orientation of the human actor, we use one of the standard face detection methods (Viola and Jones, 2001) over the RGB data to determine the front-facing actors. As shown in Fig. 7(c and d), the bounding box from the unified skeleton completely overlaps the correct region of the merged 3D point clouds, resulting in the higher value of ξt In this particular frame, the unified skeleton has on average 7.73% better quality, compared to the individual cameras. We found that on average the goodness of the unified skeleton ξt was better than the average goodness of individual front facing cameras ξt by a factor of 7% to 10%

Discussion
Findings
Conclusion
Funding Information
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call