This work addresses the challenging problem of unconstrained 3D hand pose estimation using monocular RGB images. Most of the existing approaches assume some prior knowledge of hand (such as hand locations and side information) is available for 3D hand pose estimation. This restricts their use in unconstrained environments. Therefore, we present an end-to-end framework that robustly predicts hand prior information and accurately infers 3D hand pose by learning ConvNet models while only using keypoint annotations. To enhance the hand detector’s robustness, we propose a novel keypoint-based method to simultaneously predict hand regions and side labels, unlike existing methods that suffer from background color confusion caused by using segmentation or detection-based technology. Moreover, inspired by the human hand’s biological structure, we introduce two geometric constraints directly into the 3D coordinates prediction that further improves its performance. Experimental results show that our proposed framework outperforms the state-of-art methods on standard benchmark datasets while providing robust predictions.
Read full abstract