3D Hand Pose Estimation with Disentangled Cross-Modal Latent Space

Jiajun Gu,Jiafeng Li,Li Zhuo,Zhiyong Wang,Weichen Zhang,Wanli Ouyang

doi:10.1109/wacv45572.2020.9093316

Abstract

Estimating 3D hand pose from a single RGB image is a challenging task because of its ill-posed nature (i.e., depth ambiguity). Recently, various generative approaches have been proposed to predict the 3D joints of an RGB hand image by learning a unified latent space between two modalities (i.e., RGB image and 3D joints). However, projecting multi-modal data (i.e., RGB images and 3D joints) into a unified latent space is difficult as the modality-specific features usually interfere the learning of the optimal latent space. Hence in this paper, we propose to disentangle the latent space into two sub-latent spaces: modality- specific latent space and pose-specific latent space for 3D hand pose estimation. Our proposed method, namely Disentangled Cross-Modal Latent Space (DCMLS), consists of two variational autoencoder networks and auxiliary components which connect the two VAEs to align underlying hand poses and transfer modality-specific context from RGB to 3D. For the hand pose latent space, we align it with the two modalities by using a cross-modal discriminator with an adversarial learning strategy. For the context latent space, we learn a context translator to gain access to the cross-modal context. Experimental results on two widely used public benchmark datasets RHD and STB demonstrate that our proposed DCMLS method is able to clearly outperform the state-of-the-art ones on single image based 3D hand pose estimation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

3D Hand Pose Estimation with Disentangled Cross-Modal Latent Space

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Real Time 3D Pose Estimation of Both Human Hands via RGB-Depth Camera and Deep Convolutional Neural Networks
Geon Gi ... Hye Min Park
-
Geon Gi, et. al.Geon Gi ... Hye Min Park
06 Jun 2019
06 Jun 2019

Cascaded Hierarchical CNN for RGB-Based 3D Hand Pose Estimation
Shiming Dai ... Lili Fan
Mathematical Problems in Engineering | VOL. 2020
Shiming Dai, et. al.Shiming Dai ... Lili Fan
15 Jul 2020
Mathematical Problems in Engineering | VOL. 2020

3D hand pose and shape estimation from RGB images for keypoint-based hand gesture recognition
Danilo Avola ... Daniele Pannone
Pattern Recognition | VOL. 129
Danilo Avola, et. al.Danilo Avola ... Daniele Pannone
30 Apr 2022
Pattern Recognition | VOL. 129

End-to-End Hand Mesh Recovery From a Monocular RGB Image
Xiong Zhang ... Wenbo Zhang
-
Xiong Zhang, et. al.Xiong Zhang ... Wenbo Zhang
01 Oct 2019
01 Oct 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

3D Hand Pose Estimation with Disentangled Cross-Modal Latent Space

Abstract

Talk to us

Similar Papers