Abstract

Human pose estimation from a monocular image has attracted lots of interest due to its huge potential application in many areas. The performance of 2D human pose estimation has been improved a lot with the emergence of deep convolutional neural network. In contrast, the recovery of 3D human pose from an 2D pose is still a challenging problem. Currently, most of the methods try to learn a universal map, which can be applied for all human poses in any viewpoints. However, due to the large variety of human poses and camera viewpoints, it is very difficult to learn a such universal mapping from current datasets for 3D pose estimation. Instead of learning a universal map, we propose to learn an adaptive viewpoint transformation module, which transforms the 2D human pose to a more suitable viewpoint for recovering the 3D human pose. Specifically, our transformation module takes a 2D pose as input and predicts the transformation parameters. Rather than some hand-crafted criteria, this module is directly learned from the datasets and depends on the input 2D pose in testing phrase. Then the 3D pose is recovered from this transformed 2D pose. Since the difficulty of 3D pose recovery becomes smaller, we can obtain more accurate estimation results. Experiments on Human3.6M and MPII datasets show that the proposed adaptive viewpoint transformation can improve the performance of 3D human pose estimation.

Highlights

  • Human pose estimation is to estimate the 2D or 3D locations of human joints from images or videos

  • Rather than some hand-crafted criteria, our module is based on the deep convolutional neural network (DCNN) and is directly learned from the datasets. It only depends on the input 2D pose

  • In this paper, an adaptive viewpoint transformation network is proposed for 3D human pose estimation

Read more

Summary

Introduction

Human pose estimation is to estimate the 2D or 3D locations of human joints from images or videos. Due to its huge potential application in human motion prediction, action analysis and intelligent video surveillance [1]–[3], 3D human pose estimation from a monocular image has attracted more and more attention in recent years. Due to the loss of depth information when projecting a person in real world to a 2D image space, it is an ill-posed problem to estimate the 3D pose from a 2D monocular image. Considering this nature, early research is restricted to some simplified settings, such as specified actions or fixed background.

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.