Abstract
3-D human pose estimation is a crucial step for understanding human actions. However, reliably capturing precise 3-D position of human joints is non-trivial and tedious. Current models often suffer from the scarcity of high-quality 3-D annotated training data. In this work, we explore a novel way of obtaining gigantic 3-D human pose data without manual annotations. In catedioptric videos (\emph{e.g.}, people dance before a mirror), the camera records both the original and mirrored human poses, which provides cues for estimating 3-D positions of human joints. Following this idea, we crawl a large-scale Dance-before-Mirror (DBM) video dataset, which is about 24 times larger than existing Human3.6M benchmark. Our technical insight is that, by jointly harnessing the epipolar geometry and human skeleton priors, 3-D joint estimation can boil down to an optimization problem over two sets of 2-D estimations. To our best knowledge, this represents the first work that collects high-quality 3-D human data via catadioptric systems. We have conducted comprehensive experiments on cross-scenario pose estimation and visualization analysis. The results strongly demonstrate the usefulness of our proposed DBM human poses.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have