Abstract

<abstract> <p>Three-dimensional human pose estimation is a key technology in many computer vision tasks. Regressing a 3D pose from 2D images is a challenging task, especially for applications in natural scenes. Recovering the 3D pose from a monocular image is an ill-posed problem itself; moreover, most of the existing datasets have been captured in a laboratory environment, which means that the model trained by them cannot generalize well to in-the-wild data. In this work, we improve the 3D pose estimation performance by introducing the attention mechanism and a calibration network. The attention model will capture the channel-wise dependence, so as to enhance the depth analysis ability of the model. The multi-scale pose calibration network adaptively learns body structure and motion characteristics, and will therefore rectify the estimation results. We tested our model on the Human 3.6M dataset for quantitive evaluation, and the experimental results show the proposed methods with higher accuracy. In order to test the generalization capability for in-the-wild applications, we also report the qualitative results on the natural scene Leeds Sports Pose dataset; the visualization results show that the estimated results are more reasonable than the baseline model.</p> </abstract>

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call