Abstract

Human motion modeling is essential for video-based 3D human pose and shape estimation. Most existing methods model human motion by learning a deterministic mapping from the input videos to the human body parameters, while the uncertainties such as occlusions and depth ambiguities are ignored. To address this problem, we propose a probabilistic model based on conditional normalizing flows called FlowPose to learn the distribution of feasible 3D human motion. This model allows access to the most likely 3D human poses given a video input, which means that more accurate and temporally coherent human poses can be obtained. Additionally, a contrastive training strategy is utilized to maximize the mutual information between video features and their 3D human poses, resulting in an improvement on feature extraction of the conditional flow model. Experimental results on two benchmarks 3DPW and Human3.6M demonstrate that our method outperforms the state-of-the-art video-based methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call