Abstract
Hand pose estimation is a critical technology of computer vision and human-computer interaction. Deep-learning methods require a considerable amount of tagged data. Accordingly, numerous labeled training data are required. This paper aims to generate depth hand images. Given a ground-truth 3D hand pose, the developed method can generate depth hand images. To be specific, a ground truth can be 3D hand poses with the hand structure contained, while the synthesized image has an identical size to that of the training image and a similar visual appearance to the training set. The developed method, inspired by the progress in the generative adversarial network (GAN) and image-style transfer, helps model the latent statistical relationship between the ground-truth hand pose and the corresponding depth hand image. The images synthesized using the developed method are demonstrated to be feasible for enhancing performance. On public hand pose datasets (NYU, MSRA, ICVL), comprehensive experiments prove that the developed method outperforms the existing works.
Highlights
As human-computer interaction [1,2] has been optimized, a computer-vision method has been adopted to detect the 3D pose of the human hand and its knuckles from an image or image sequence in a non-contact manner
In recent years, based on a fully supervised convolutional neural network [5], some progress has been made in hand pose estimation using data-driven methods [6,7,8,9]
Due to generative adversarial network (GAN) ignoring the noise of real depth images, style transfer is used to extract the contours of the synthetic image and the textures of style image, and to mix the content and style features to obtain the phantom in it can be empirically observed that style structure eliminates the shadow of the image background
Summary
As human-computer interaction [1,2] has been optimized, a computer-vision method has been adopted to detect the 3D pose of the human hand and its knuckles from an image or image sequence in a non-contact manner. By learning and analyzing the 3D motion state of human hands, it is possible to create a more natural and efficient human-computer interaction environment. In recent years, based on a fully supervised convolutional neural network [5], some progress has been made in hand pose estimation using data-driven methods [6,7,8,9]. Since the human hand has the following characteristics [10,11,12]: multiple degrees of freedom, self-occlusion, and self-similarity in image. There are often only a limited number of manually annotated depth images, in which field experts portray hand joints through strenuous and time-consuming manual processes
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.