Abstract

Hand pose estimation plays an essential role in sign language understanding and human-computer interaction. Existing RGB-based 2D hand pose estimation methods learn the joint locations from a single resolution, which is not suitable for different hand sizes. To tackle this problem, we propose a new deep learning-based framework that consists of two main modules. The first one presents a segmentation-based approach to detect the hand skeleton and localize the hand bounding box. The second module regresses the 2D joint locations through a multi-scale heatmap regression approach that exploits the predicted hand skeleton as a constraint to guide the model. Moreover, we construct a new dataset that is suitable for both hand detection and pose estimation tasks. It includes the hand bounding boxes, the 2D keypoints, the 3D poses and their corresponding RGB images. We conduct extensive experiments on two datasets to validate our method. Qualitative and quantitative results demonstrate that the proposed method outperforms the state-of-the-art and recovers the pose even in cluttered images and complex poses.

Highlights

  • The hands are one of the most important and intuitive interaction tools for humans

  • We focus on the problem of 2D hand pose estimation from a single RGB image

  • We propose a new learning-based method for 2D hand pose estimation

Read more

Summary

INTRODUCTION

The hands are one of the most important and intuitive interaction tools for humans. Solving the hand pose estimation problem is crucial for many applications, including humancomputer interaction, virtual reality and augmented reality. The earlier works in hand tracking use special hardware to track the hand, such as gloves and visual markers These types of solutions are expensive and restrict the applications to limited scenarios. This is mainly due to its nonlinear nature requiring many iterations and a lot of data for convergence To overcome these limitations, recent works use probability density maps such as the heatmap to solve human and hand pose estimation problems [7], [10], [11]. The network output is supervised on different scales to ensure accurate poses for different hand image sizes This strategy helps the model for better learning of the contextual and the location information. Results demonstrate that our method generates accurate poses and outperforms three state-of-thearts [8], [14], [15]

SKELETON DETECTION AND BOUNDING BOX
MULTI-SCALE HEATMAPS REGRESSION
Implementation details
Hand detection
Pose estimation
Methods
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call