Abstract

AbstractHand pose estimation based on 2D RGB images has drawn increasing research interest due to its many practical applications, such as Human-Computer Interaction (HCI) and Virtual Reality (VR). However, most existing methods focus on learning hand structure and key point representations, which cannot well exploit the joint interdependency of 2D occluded hand pose estimation. In this paper, we propose an adaptive joint interdependency learning network (AJIL) for 2D occluded hand pose estimation by adaptively learning hand joint interdependency, including three sub-networks. First, a cascade multi-task mask-learning subnetwork is used to learn hand pose structure. Then, a modified transformer encoder is designed to exploit the high spatial relationship between the hand joints. Lastly, the joint correlation is obtained from the multi-view hand pose images via 21 long short-term memory (LSTM). Extensive studies on three widely used datasets including the CMU Panoptic Hand, Large-Scale Multiview Hand Pose, and also our newly established pen-holding hand pose (PHHP) images dataset which is conducted to evaluate our proposed method. Experimental results show that our proposed method can achieve a very competitive 2D hand pose estimation performance when compared with the baseline models.KeywordsHand pose estimation2D occluded hand pose estimationHand joint interdependency learning

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call