Abstract

We address how human pose in 3D can be tracked from a monocular video using a probabilistic inference method. Human body is modeled as a number of cylinders in space, each with an appearance facet as well as a pose facet. The appearance facets are acquired in a learning phase from some beginning frames of the input video. On this the visual hull description of the target human subject constructed from multiple images is found to be instrumental. In the operation phase, the 3D pose of the target subject in the subsequent frames of the input video is tracked. A bottom-up framework is used, which for any current image frame extracts firstly the tentative candidates of each body part in the image space. The human model, with the appearance facets already learned, and with the pose entries initialized with those for the previous image frame, is then brought in under a belief propagation algorithm, to establish correlation with the above 2D body part candidates while enforcing the proper articulation between the body parts, thereby determining the 3D pose of the human body in the current frame. The tracking performance on a number of monocular videos is shown.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call