Abstract
In this paper, we present a novel deep learning-based architecture, which is under the scope of expert and intelligent systems, to perform accurate real-time tridimensional hand pose estimation using a single RGB frame as an input, so there is no need to use multiple cameras or points of view, or RGB-D devices. The proposed pipeline is composed of two convolutional neural network architectures. The first one is in charge of detecting the hand in the image. The second one is able to accurately infer the tridimensional position of the joints retrieving, thus, the full hand pose. To do this, we captured our own large-scale dataset composed of images of hands and the corresponding 3D joints annotations.The proposal achieved a 3D hand pose mean error of below 5 mm on both the proposed dataset and Stereo Hand Pose Tracking Benchmark, which is a public dataset. Our method also outperforms the state-of-the-art methods.We also demonstrate in this paper the application of the proposal to perform a robotic hand teleoperation with high success.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.