Abstract

Egocentric hand pose estimation is significant for wearable cameras since the hand interactions are captured from an egocentric viewpoint. Several studies on hand pose estimation have recently been presented based on RGBD or RGB sensors. Although these methods provide accurate hand pose estimation, they have several limitations. For example, RGB-based techniques have intrinsic difficulty in converting relative 3D poses into absolute 3D poses, and RGBD-based techniques only work in indoor environments. Recently, stereo-sensor-based techniques have gained increasing attention owing to their potential to overcome these limitations. However, to the best of our knowledge, there are few techniques and no real datasets available for egocentric stereo vision. In this paper, we propose a top-down pipeline for estimating absolute 3D hand poses using stereo sensors, as well as a novel dataset for training. Our top-down pipeline consists of two steps: hand detection and hand pose estimation. Hand detection detects hand areas and then is followed by hand pose estimation, which estimates the positions of the hand joints. In particular, for hand pose estimation with a stereo camera, we propose an attention-based architecture called <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">StereoNet</i> , a geometry-based loss function called <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">StereoLoss</i> , and a novel 2D disparity map called <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">StereoDMap</i> for effective stereo feature learning. To collect the dataset, we proposed a novel annotation method that helps reduce human annotation efforts. Our dataset is publicly available at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/seo0914/SEH</uri> . We conducted comprehensive experiments to demonstrate the effectiveness of our approach compared with the state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call