Abstract

Hand pose estimation from 3D data is a key challenge in computer vision as well as an essential step for human–computer interaction. A lot of deep learning-based hand pose estimation methods have made significant progress but give less consideration to the inner interactions of input data, especially when consuming hand point clouds. Therefore, this paper proposes an end-to-end capsule-based hand pose estimation network (Capsule-HandNet), which processes hand point clouds directly with the consideration of structural relationships among local parts, including symmetry, junction, relative location, etc. Firstly, an encoder is adopted in Capsule-HandNet to extract multi-level features into the latent capsule by dynamic routing. The latent capsule represents the structural relationship information of the hand point cloud explicitly. Then, a decoder recovers a point cloud to fit the input hand point cloud via a latent capsule. This auto-encoder procedure is designed to ensure the effectiveness of the latent capsule. Finally, the hand pose is regressed from the combined feature, which consists of the global feature and the latent capsule. The Capsule-HandNet is evaluated on public hand pose datasets under the metrics of the mean error and the fraction of frames. The mean joint errors of Capsule-HandNet on MSRA and ICVL datasets reach 8.85 mm and 7.49 mm, respectively, and Capsule-HandNet outperforms the state-of-the-art methods on most thresholds under the fraction of frames metric. The experimental results demonstrate the effectiveness of Capsule-HandNet for 3D hand pose estimation.

Highlights

  • Along with the development of depth cameras, interaction based on hand poses plays an important role in human–computer interaction [1,2] and has extensive application scenarios

  • Inspired by [15,16], the capsule and the unsupervised dynamic routing mechanism are adopted in Capsule-HandNet, which represents the structural relationships among local parts of hand point clouds explicitly

  • A capsule and dynamic routing based mechanism is first employed for hand pose estimation, which enable the network to learn the structural relationships among the local parts of the hand point cloud

Read more

Summary

Introduction

Along with the development of depth cameras, interaction based on hand poses plays an important role in human–computer interaction [1,2] and has extensive application scenarios. This paper proposes an end-to-end capsule-based network for hand pose estimation from hand point clouds, namely Capsule-HandNet. More importantly, inspired by [15,16], the capsule and the unsupervised dynamic routing mechanism are adopted in Capsule-HandNet, which represents the structural relationships among local parts of hand point clouds explicitly. The auto-encoder phase is a symmetric evaluation, which is designed to optimize the latent capsule before the hand pose regression. A capsule and dynamic routing based mechanism is first employed for hand pose estimation, which enable the network to learn the structural relationships among the local parts of the hand point cloud. An auto-encoder with a symmetric Chamfer distance metric is designed for hand feature optimization to acquire an effective latent capsule.

Related Work
Deeping Learning on Point Cloud
Hand Pose Estimation
Methodology
Hand Point Cloud Preprocessing
Capsule and Dynamic Routing
Hand Pose Estimation Network
Experiments
Datasets and Settings
Comparisons with State-of-the-Art Methods
Method
Ablation Study
Runtime and Model Size
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call