Abstract

We present a Bayesian framework for estimating 3D human pose and camera from a single RGB image. We develop a generative model where a 3D pose is rendered onto an image (via the camera), which then generates a detection probability map for each body part. We represent a human pose with a set of 3D cylinders in space, one for each body part, and we place kinematic and self-intersection priors on the model. Importantly, we use a graphics engine (e.g., OpenGL) to render the pose, and use its built-in capabilities for color blending to efficiently compute the likelihood of the model given the observed probability maps, which are obtained by running a convolutional neural network classifier on a test image. We explore the space of 3D poses and camera configurations via the Hybrid Monte Carlo algorithm, with sampling moves designed specifically for this problem. We train the parameters of our prior and likelihood distributions using annotated poses from the CMU mocap database, and test our algorithm on two benchmark datasets, where we compare performance against state-of-the-art methods. Additionally, we demonstrate the flexibility of our framework by incorporating a likelihood function for depth images and showing the associated performance gains.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.