Abstract

3-D hand pose estimation is an essential problem for human–computer interaction. Most of the existing depth-based hand pose estimation methods consume 2-D depth map or 3-D volume via 2-D/3-D convolutional neural networks. In this paper, we propose a deep semantic hand pose regression network (SHPR-Net) for hand pose estimation from point sets, which consists of two subnetworks: a semantic segmentation subnetwork and a hand pose regression subnetwork. The semantic segmentation network assigns semantic labels for each point in the point set. The pose regression network integrates the semantic priors with both input and late fusion strategy and regresses the final hand pose. Two transformation matrices are learned from the point set and applied to transform the input point cloud and inversely transform the output pose, respectively, which makes the SHPR-Net more robust to geometric transformations. Experiments on NYU, ICVL, and MSRA hand pose data sets demonstrate that our SHPR-Net achieves high performance on par with the start-of-the-art methods. We also show that our method can be naturally extended to hand pose estimation from the multi-view depth data and achieves further improvement on the NYU data set.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call