Abstract

In this paper, we propose a deep-learning-based approach for real-time hand pose estimation from a single depth image by using 3D convolutional neural network which takes a 3D voxelized grid generated by a depth image as input. Most of the previous works for hand pose estimation only take a single 2D depth image as input and estimate coordinates of the key points of a hand with 2D convolutional neural network. The disadvantage of those methods is that 2D depth image can not represent the spatial information of 3D data, while the 3D voxelized grid can represent the point cloud of the surface of the hand in a spatial way. Hence, we design a 3D convolutional neural network which takes a 3D voxelized grid with data padding as input and steadying the hand skeleton with an additional loss function for regression. Experiments show that our approach outperforms previous methods on two public datasets and can run in real time with a single GPU.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call