Abstract

3D hand pose estimation and shape reconstruction is to recover the hand joint points and hand mesh vertices coordinates from the image. However, existing methods usually only use the high-level semantic features extracted by the backbone network to represent the hand mesh vertex features, which leads to a single representation of the hand vertices features and cannot fully utilize the feature information extracted by the network. In this paper, we propose a method for real-time 3D reconstruction of hands from a single RGB image, which enriches the 3D semantic information of the mesh vertices through multi-feature fusion. Firstly, we regress the 2D features of mesh vertices through Integral Pose Regression (IPR) and regard them as prior information to 3D features. Then we design a Multi-Scale Sampling(MSS) module to extract multi-scale information. Finally we fuse 2D prior features, multi-scale features, and high-level semantic features extracted by backbone to represent 3D initial feature. Additionally, we propose a Multi-Root(MR) loss function to address the imbalance problem caused by a single root joint. The experimental results indicate that our network achieves competitive performance on the FreiHAND and HO-3D public datasets, achieving fast inference speed with fewer parameters.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call