Cascaded Hierarchical CNN for RGB-Based 3D Hand Pose Estimation

Shiming Dai,Wei Liu,Jihao Zhang,Lili Fan,Wenji Yang

doi:10.1155/2020/8432840

Shiming Dai, Wei Liu + Show 3 more

Open Access

PDF Available

https://doi.org/10.1155/2020/8432840

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

3D hand pose estimation can provide basic information about gestures, which has an important significance in the fields of Human-Machine Interaction (HMI) and Virtual Reality (VR). In recent years, 3D hand pose estimation from a single depth image has made great research achievements due to the development of depth cameras. However, 3D hand pose estimation from a single RGB image is still a highly challenging problem. In this work, we propose a novel four-stage cascaded hierarchical CNN (4CHNet), which leverages hierarchical network to decompose hand pose estimation into finger pose estimation and palm pose estimation, extracts separately finger features and palm features, and finally fuses them to estimate 3D hand pose. Compared with direct estimation methods, the hand feature information extracted by the hierarchical network is more representative. Furthermore, concatenating various stages of the network for end-to-end training can make each stage mutually beneficial and progress. The experimental results on two public datasets demonstrate that our 4CHNet can significantly improve the accuracy of 3D hand pose estimation from a single RGB image.

Highlights

With the rapid development of computer vision technology, 3D hand pose estimation is gradually applied to the fields of Human-Machine Interaction (HMI), Virtual Reality (VR), and Augmented Reality (AR) [1,2,3], which makes vision-based 3D hand pose estimation become an active research area [4], and has achieved great progress after years of research [5,6,7,8,9,10,11,12,13]
According to the back-propagation mechanism of the neural network, the mutual promotion and common progress can be achieved by each stage. e hierarchical estimation stage processes hand feature extracted hierarchically to extract more effective, deeper, and more representative feature information and fuses the feature information of all layers to estimate the 3D hand pose to improve the estimation accuracy of the 3D gesture
Based on the cascaded CNN and hierarchical CNN, we have proposed a novel four-stage cascaded hierarchical CNN (4CHNet) for estimating 3D hand pose of a single RGB image

Summary

Introduction

Research on 3D hand pose estimation based on depth images is progressing rapidly with the development of the depth cameras [14,15,16]. Erefore, the result of current 3D hand pose estimation based on RGB images is not ideal enough. We present a four-stage cascaded hierarchical CNN (4CHNet) for RGB-based 3D hand pose estimation. According to the back-propagation mechanism, each stage is mutually beneficial and progressive together in the training process to achieve the global optimization and refine the models

Methods

Results

Conclusion