Abstract

A real-time motion capture system is presented which uses input from multiple standard video cameras and inertial measurement units (IMUs). The system is able to track multiple people simultaneously and requires no optical markers, specialized infra-red cameras or foreground/background segmentation, making it applicable to general indoor and outdoor scenarios with dynamic backgrounds and lighting. To overcome limitations of prior video or IMU-only approaches, we propose to use flexible combinations of multiple-view, calibrated video and IMU input along with a pose prior in an online optimization-based framework, which allows the full 6-DoF motion to be recovered including axial rotation of limbs and drift-free global position. A method for sorting and assigning raw input 2D keypoint detections into corresponding subjects is presented which facilitates multi-person tracking and rejection of any bystanders in the scene. The approach is evaluated on data from several indoor and outdoor capture environments with one or more subjects and the trade-off between input sparsity and tracking performance is discussed. State-of-the-art pose estimation performance is obtained on the Total Capture (mutli-view video and IMU) and Human 3.6M (multi-view video) datasets. Finally, a live demonstrator for the approach is presented showing real-time capture, solving and character animation using a light-weight, commodity hardware setup.

Highlights

  • Real-time capture of human motion is of considerable interest in various domains including entertainment and the life sciences

  • We fuse multi-modal input from inertial sensors and multiple cameras to produce an estimate of the full 3D pose of one or more subjects in real time without requiring optical markers or a complex hardware setup (Fig. 1)

  • The orientation and acceleration constraints are provided by a sparse set of inertial measurement units (IMUs) attached to body segments, and positional constraints are obtained from 2D joint detections from video cameras (Cao et al 2017)

Read more

Summary

Introduction

Real-time capture of human motion is of considerable interest in various domains including entertainment and the life sciences. Dio settings to more natural, outdoor environments, and with less encumbrance of the performers from specialized costumes and optical marker setups traditionally required We fuse multi-modal input from inertial sensors and multiple cameras to produce an estimate of the full 3D pose of one or more subjects in real time without requiring optical markers or a complex hardware setup (Fig. 1). The orientation and acceleration constraints are provided by a sparse set of inertial measurement units (IMUs) attached to body segments, and positional constraints are obtained from 2D joint detections from video cameras (Cao et al 2017). The IMUs provide full rotational information for body segments, while the video information provides drift-free 3D global position information

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call