Abstract

Making full use of the spatial information of the depth data is crucial for 3D hand pose estimation from a single depth image. In this paper, we propose a Spatial-aware Stacked Regression Network (SSRN) for fast, robust and accurate 3D hand pose estimation from a single depth image. By adopting a differentiable pose re-parameterization process, our method efficiently encodes the pose-dependent 3D spatial structure of the depth data as spatial-aware representations. Taking such spatial-aware representations as inputs, the stacked regression network utilizes multi-joint spatial context and the 3D spatial relationship between the estimated pose and the depth data to predict a refined hand pose. To further improve the estimation accuracy, we adopt a spatial attention mechanism to reduce the influence of irrelevant features for pose regression. In order to improve the speed of the network, we propose a cross-stage self-distillation mechanism to distill knowledge within the network itself. Experiments on four datasets show that our proposed method achieves state-of-the-art accuracy with high running speed around 330 FPS on a single GPU and 35 FPS on a single CPU.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.