Abstract

Recent emerging technologies such AR/VR and HCI are drawing high demand on more comprehensive hand shape understanding, requiring not only 3D hand skeleton pose but also hand shape geometry. In this paper, we propose a deep learning framework to produce 3D hand shape from a single depth image. To address the challenge that capturing ground truth 3D hand shape in the training dataset is non-trivial, we leverage synthetic data to construct a statistical hand shape model and adopt weak supervision from widely accessible hand skeleton pose annotation. To bridge the gap due to the different hand skeleton definitions in the existing public datasets, we propose a joint regression network for hand pose adaptation. To reconstruct the hand shape, we use Chamfer loss between the predicted hand shape and the point cloud from the input depth to learn the shape reconstruction model in a weakly-supervised manner. Experiments demonstrate that our model adapts well to the real data and produces accurate hand shapes that outperform the state-of-the-art methods both qualitatively and quantitatively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call