Abstract

3D human hand pose estimation (HPE) is an essential methodology for smart human computer interfaces. Especially, 3D hand pose estimation without attached or hand-held sensors provides a more natural and convenient way. In this work, we present a HPE system with a single RGB-Depth camera and deep learning methodologies which recognizes 3D hand poses of both hands in real-time. Our HPE system consists of four steps: hands detection and segmentation, right and left hand classification using a Convolutional Neural Network (CNN) classifier, hand pose estimation using a deep CNN regressor, and 3D hand pose reconstruction. First, both hands are detected and segmented from each RGB and depth images using skin detection and depth cutting algorithms. Second, a CNN classifier is used to distinguish right and left hands. Our CNN classifier consists of three convolutional layers and two fully connected layers, and uses the segmented depth images as input. Third, a trained deep CNN regressor estimates the key sixteen joints of hands in 3D from the segmented left and right depth hands separately. The regressor is hierarchically composed of multiple convolutional layers, pooling layers and dense fully connected layers to estimate the hand joints from the segmented hand depth images. Finally, 3D hand pose of each hand gets reconstructed from the estimated hand joints. The results show that our CNN classifier distinguishes the right and left hands with an accuracy of 96.9%. The 3D human hand poses are estimated with an average distance error of 8.48 mm. The presented HPE system can be used in various application fields including medical VR, AR, and MR applications. Our presented HPE system should allow natural hand gesture interfaces to interact with various medical contents.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call