Abstract

The field of 3D hand pose estimation has been gaining a lot of attention recently, due to its significance in several applications that require human-computer interaction (HCI). The utilization of technological advances, such as cost-efficient depth cameras coupled with the explosive progress of Deep Neural Networks (DNNs), has led to a significant boost in the development of robust markerless 3D hand pose estimation methods. Nonetheless, finger occlusions and rapid motions still pose significant challenges to the accuracy of such methods. In this survey, we provide a comprehensive study of the most representative deep learning-based methods in literature and propose a new taxonomy heavily based on the input data modality, being RGB, depth, or multimodal information. Finally, we demonstrate results on the most popular RGB and depth-based datasets and discuss potential research directions in this rapidly growing field.

Highlights

  • Markerless hand pose estimation can be defined as the task of predicting the position and orientation of the hand and fingers relative to some coordinate system, given an RGB image and/or volumetric data captured from a depth camera

  • There has been a rapid progress in hand pose estimation, it still remains a challenging task with unique characteristics and difficulties, due to hardware limitations and constraints manifesting from the physiology of hands

  • We provide a comprehensive overview of the modern deep learning methods on 3D hand pose estimation, along with a brief overview of earlier machine learning methods for context

Read more

Summary

Introduction

Markerless hand pose estimation can be defined as the task of predicting the position and orientation of the hand and fingers relative to some coordinate system, given an RGB image and/or volumetric data captured from a depth camera. Recent end-to-end Deep Neural Networks (DNNs), e.g., convolutional neural networks (CNNs) [31], recurrent neural networks (RNNs) [32], auto-encoders [33], and generative adversarial networks (GANs) [34], have proven to be capable of extracting much more meaningful features from the available data than previous handcrafted techniques The employment of such networks has further raised the performance of methods that engage with complex data, such as RGB images.

Previously Proposed Categorizations
Early Machine Learning Methods
Deep Learning Methods
Depth-Based Approaches
RGB-Based Approaches
Model-Free Approaches
Model-Based Approaches
Multimodal Approaches
Unimodal Inference
Multimodal Inference
Datasets
Metrics
Conclusions and Future Directions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call