Abstract
Most of the existing methods for 3D hand pose estimation are performed from a single depth map. In that case, the depth missing challenges from input frames caused by hand self-occlusions and imaging quality lead to multi-valued mapping phenomenon and sub-optimal model. In this paper, we proposed a novel recurrent architecture named Attention-based Pose Sequence Machine (APSM) to alleviate challenges by introducing temporal consistency. As for recurrent unit (RU), we extend traditional Gated Recurrent Unit (GRU) with 3D convolutional neural networks (CNNs) to handle voxelized inputs and features, and a novel RU named Deep Gated Recurrent Unit (DGRU) was proposed by rebuilding deeper gates based on GRU. To improve the model performance, a novel spatial attention mechanism denoted as Attention Model (AM) was proposed. Ablation experiments are designed to validate each contribution of our work, and experiments on two publicly available dataset show that our work outperforms state-of-the-art on hand pose estimation.
Highlights
Accurate 3D hand pose estimation has been critical technologies for diverse human-computer interaction applications, such as virtual or augmented reality [1], driver interaction [2], and sign language recognition [3]–[5]
As for Recurrent Unit (RU), we proposed novel variant named Deep Gated Recurrent Unit (DGRU), which focuses on rebuilding deeper feature extraction gates
We proposed a novel recurrent architecture named Attention-based Pose Sequence Machine (APSM) for hand pose estimation, which is characterized by introducing temporal consistency to alleviate the depth missing challenges
Summary
Accurate 3D hand pose estimation has been critical technologies for diverse human-computer interaction applications, such as virtual or augmented reality [1], driver interaction [2], and sign language recognition [3]–[5]. Image pairs illustrated in [14] require precise one-to-one correspondences, and it is hard for single frame to guarantee when the real errors are large Both challenges are attributed to depth missing of 3D human hand, and most of recent discriminative approaches [12]–[15] conducted hand estimation from single depth image, which usually leads to sub-optimal model and multi-valued mapping. 1. We proposed a novel recurrent architecture named APSM for hand pose estimation, which is characterized by introducing temporal consistency to alleviate the depth missing challenges. 3. A novel spatial attention model denoted as AM was introduced to act as feature weighting on input feature, and ablation experiments show that AM helps to improve the estimation accuracy.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.