Abstract

Estimating 3D interacting hand poses and shapes from a single RGB image is challenging as it is difficult to distinguish the left and right-hands in interacting hand pose analysis. This paper proposes a network called GroupPoseNet using a grouping strategy to address this problem. GroupPoseNet extracts the left- and right-hand features respectively and thus avoids the mutual affection between the interacting hands. Empowered by a novel up-sampling block called MF-Block predicting 2D heat-maps in a progressive way by fusing image features, hand pose features, and multi-scale features, GroupPoseNet is effective and robust to severe occlusions. To achieve an effective 3D hand reconstruction, we design a transformer mechanism based inverse kinematics module(termed TikNet) to map 3D joint locations to hand shape and pose parameters of MANO hand model. Comprehensive experiments on the InterHand2.6M dataset show GroupPoseNet outperforms existing methods by a significant margin. Additional experiments also demonstrate it has a good generalization ability in the problems including left-hand, right-hand and interacting hand pose estimation from a single RGB image. We also show the efficiency of TikNet by the quantitative and qualitative results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call