Abstract

The recent success of neural networks has significantly advanced the performance of 3D human pose estimation from 2D input images. However, the diversity of capturing viewpoints and the flexibility of the human poses remain some significant challenges. In this paper, we propose a view-invariant 3D human pose estimation module to alleviate the effects of viewpoint diversity. The proposed framework consists of a base network, which provides an initial estimation of a 3D pose, a view-invariant hierarchical correction network (VI-HC) on top of that to learn the 3D pose refinement under consistent views, and a view-invariant discriminative network (VID) to enforce high-level constraints over body configurations. In VI-HC, the initial 3D pose inputs are automatically transformed to consistent views for further refinements at the global body and local body parts level, respectively. For the VID, under consistent viewpoints, we use adversarial learning to differentiate between estimated 3D poses and real 3D poses to avoid implausible results. The experimental results demonstrate that the constraint on viewpoint consistency can dramatically enhance the performance of 3D human pose estimation. Our module shows robustness for different 3D pose base networks and achieves a significant improvement (about 9%) over a powerful baseline on the public 3D pose estimation benchmark Human3.6M.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call