Abstract

Despite the recent progress of 3D human pose estimation, reconstructing an accurate 3D human posture from a single image without 3D annotation is yet challenging due to the following reasons. First, the reconstructing process is inherently ambiguous, as multiple 3D poses can be projected onto the same 2D pose. Second, camera rotation is difficult to measure precisely without laborious camera calibration. Some approaches resort to traditional computer vision algorithms to address these issues, but they are not differentiable and cannot be optimized through training. In this paper, we propose two geometrically explicit modules to solve the problems without any 3D ground-truth or camera parameters. Relative depth estimation module effectively mitigates depth ambiguity, reducing a number of possible depths for each joint to only two candidates. Differentiable pose alignment module calculates camera rotation via aligning poses from different views. The two modules are geometrically interpretable, reducing the training difficulty and leading to superior performance. Our method achieves state-of-the-art on the standard benchmark datasets among self-supervised methods and even outperforms several fully-supervised approaches that rely on 3D annotations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call