Abstract

In this paper, we address the problem of 3D human mesh reconstruction from a single 2D human pose based on deep learning. We propose MeshLifter, a network that estimates a 3D human mesh from an input 2D human pose. Unlike most existing 3D human mesh reconstruction studies that train models using paired 2D and 3D data, we propose a weakly supervised learning method based on a loop structure to train the MeshLifter. The proposed method alleviates the difficulty of obtaining ground-truth 3D data to ensure that the MeshLifter can be trained successfully from a 2D human pose dataset and an unpaired 3D motion capture dataset. We compare the proposed method with recent state-of-the-art studies through various experiments and show that the proposed method achieves effective 3D human mesh reconstruction performance. Notably, our proposed method achieves a reconstruction error of 59.1 mm without using the 3D ground-truth data of Human3.6M, the standard dataset for 3D human mesh reconstruction.

Highlights

  • Nowadays, intelligent sensors such as the Microsoft Kinect can perform human body motion recognition and have been successfully used in various applications such as human–computer interaction, virtual reality, and intelligent robots

  • We show that the MeshLifter can successfully reconstruct a 3D human mesh from a noisy input 2D human pose

  • We use the reconstruction error, which computes the mean per joint position error (MPJPE) after adjusting the scale and global rotation of the predicted 3D pose and ground-truth 3D pose, according to the Procrustes analysis [15], as the evaluation metric

Read more

Summary

Introduction

Intelligent sensors such as the Microsoft Kinect can perform human body motion recognition and have been successfully used in various applications such as human–computer interaction, virtual reality, and intelligent robots. The recent rapid development of data-driven approaches, including deep learning, has made it possible to use more general red, green, and blue (RGB) image sensors for human body motion analysis than depth sensors such as the Microsoft Kinect. In the area of computer vision, research on 2D and 3D human pose estimation from a single RGB image have been improved considerably in recent years [1,2]. These studies only generate sparse keypoints of the human subject. SMPL parameterizes the variation of the 3D human mesh using low-dimensional latent variables, such as pose and shape

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call