A hybrid classification-regression approach for 3D hand pose estimation using graph convolutional networks

Ikram Kourbane,Yakup Genc

doi:10.1016/j.image.2021.116564

Abstract

Hand pose estimation is a crucial part of a wide range of augmented reality and human-computer interaction applications. Predicting the 3D hand pose from a single RGB image is challenging due to occlusion and depth ambiguities. GCN-based (Graph Convolutional Networks) methods exploit the structural relationship similarity between graphs and hand joints to model kinematic dependencies between joints. These techniques use predefined or global learned joint relationships, which may fail to capture pose dependent constraints. To address this problem, we propose a two-stage GCN-based framework that learns per-pose (per-image) relationship constraints. Specifically, the first phase quantizes the 2D/3D space to classify the joints into 2D/3D blocks based on their locality. This spatial dependency information guides the regression branch to estimate reliable 2D and 3D poses. The second stage further improves the 3D estimation through a GCN-based module that uses an adaptative nearest neighbor algorithm to determine joint relationships. Extensive experiments show that our multi-stage GCN approach yields an efficient model that produces accurate 2D/3D hand poses and outperforms the state-of-the-art on two public datasets.

Full Text