Abstract
Hand pose estimation from a single depth image has recently received significant attention owing to its importance in many applications requiring human–computer interaction. The rapid progress of convolutional neural networks (CNNs) and technological advances in low-cost depth cameras have greatly improved the performance of the hand pose estimation method. Nevertheless, regressing joint coordinates is still a challenging task due to joint flexibility and self-occlusion. Previous hand pose estimation methods have limitations in relying on a deep and complex network structure without fully utilizing hand joint connections. A hand is an articulated object and consists of six parts that represent the palm and five fingers. The kinematic constraints can be obtained by modeling the dependency between adjacent joints. This paper proposes a novel CNN-based approach incorporating hand joint connections to features through both a global relation inference for the entire hand and local relation inference for each finger. Modeling the relations between the hand joints can alleviate critical problems for occlusion and self-similarity. We also present a hierarchical structure with six branches that independently estimate the position of the palm and five fingers by adding hand connections of each joint using graph reasoning based on graph convolutional networks. Experimental results on public hand pose datasets show that the proposed method outperforms previous state-of-the-art methods. Specifically, our method achieves the best accuracy compared to state-of-the-art methods on public datasets. In addition, the proposed method can be utilized for real-time applications with an execution speed of 103 fps in a single GPU environment.
Highlights
Hand pose estimation is the task of predicting the position and orientation of the palm and fingers when given volumetric data captured by a depth camera
This paper proposes a novel network that applies a graph convolutional network (GCN) based graph reasoning module (GRM) to obtain features that contain more useful context information when applying a 2D convolutional neural networks (CNNs) method
Experimental results on three public hand pose estimation datasets show that the proposed method achieves a better performance than previous state-of-the-art methods in terms of accuracy and efficiency
Summary
Hand pose estimation is the task of predicting the position and orientation of the palm and fingers when given volumetric data captured by a depth camera. There have been many studies on improving the performance of hand pose estimation, it still remains a challenging task owing to the constraints from the physiology of the hands, such as the high degree of flexibility, occlusions, local selfsimilarity, and small hand area of the image and noise from the depth camera. The second is a discriminative (datadriven) method [12]–[17], which finds the position of the joints by learning directly from the dataset image, and is the most commonly used method for a hand pose estimation
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.