Abstract

Most of the existing methods for 3D hand pose estimation are performed from a single depth map. In that case, the depth missing challenges from input frames caused by hand self-occlusions and imaging quality lead to multi-valued mapping phenomenon and sub-optimal model. In this paper, we proposed a novel recurrent architecture named Attention-based Pose Sequence Machine (APSM) to alleviate challenges by introducing temporal consistency. As for recurrent unit (RU), we extend traditional Gated Recurrent Unit (GRU) with 3D convolutional neural networks (CNNs) to handle voxelized inputs and features, and a novel RU named Deep Gated Recurrent Unit (DGRU) was proposed by rebuilding deeper gates based on GRU. To improve the model performance, a novel spatial attention mechanism denoted as Attention Model (AM) was proposed. Ablation experiments are designed to validate each contribution of our work, and experiments on two publicly available dataset show that our work outperforms state-of-the-art on hand pose estimation.

Highlights

  • Accurate 3D hand pose estimation has been critical technologies for diverse human-computer interaction applications, such as virtual or augmented reality [1], driver interaction [2], and sign language recognition [3]–[5]

  • As for Recurrent Unit (RU), we proposed novel variant named Deep Gated Recurrent Unit (DGRU), which focuses on rebuilding deeper feature extraction gates

  • We proposed a novel recurrent architecture named Attention-based Pose Sequence Machine (APSM) for hand pose estimation, which is characterized by introducing temporal consistency to alleviate the depth missing challenges

Read more

Summary

INTRODUCTION

Accurate 3D hand pose estimation has been critical technologies for diverse human-computer interaction applications, such as virtual or augmented reality [1], driver interaction [2], and sign language recognition [3]–[5]. Image pairs illustrated in [14] require precise one-to-one correspondences, and it is hard for single frame to guarantee when the real errors are large Both challenges are attributed to depth missing of 3D human hand, and most of recent discriminative approaches [12]–[15] conducted hand estimation from single depth image, which usually leads to sub-optimal model and multi-valued mapping. 1. We proposed a novel recurrent architecture named APSM for hand pose estimation, which is characterized by introducing temporal consistency to alleviate the depth missing challenges. 3. A novel spatial attention model denoted as AM was introduced to act as feature weighting on input feature, and ablation experiments show that AM helps to improve the estimation accuracy.

RELATED WORKS
GATED RECURRENT UNIT
DEEP GATED RECURRENT UNIT
ATTENTION MODEL
NETWORK TESTING
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call