A Geometric Knowledge Oriented Single-Frame 2D-to-3D Human Absolute Pose Estimation Method

Mengxian Hu,Qin Fang,Qingqing Yan,Shu Li,Qijun Chen,Chengju Liu

doi:10.1109/tcsvt.2023.3279291

Abstract

As a critical part of the 3D human pose estimation (HPE), establishing the 2D-to-3D lifting mapping is limited by depth ambiguity. Most current works generally lack the quantitative analysis of the relative depth expression and the depth ambiguity error expression in lifting mapping, resulting in low prediction efficiency and poor interpretability. To this end, this paper mines and leverages prior geometric knowledge of these expressions based on the pinhole imaging principle, decoupling the 2D-to-3D lifting mapping and simplifying the model training. Specifically, this paper proposes a prior geometric knowledge oriented pose estimation model with two-branch transformer architectures, explicitly introducing high-dimensional prior geometric features to improve model efficiency and interpretability. It converts the regression of spatial coordinates into the prediction of spatial direction vectors between joints to generate multiple feasible solutions further alleviate the depth ambiguity. Moreover, this paper raises a novel non-learning-based absolute depth estimation algorithm based on prior geometric relationship decoupling from relative depth expression for the first time. It establishes multiple independent depth mapping from non-root nodes to the root node to calculate the absolute depth candidate, which is parameter-free, plug-and-play, and interpretable. Experiments show that the proposed pose estimation model achieves state-of-the-art performance on Human 3.6M and MPI-INF-3DHP benchmarks with lower parameters and faster inference speed, and the proposed absolute depth estimation algorithm achieves similar performance to traditional methods without any network parameters. The source code are available at https://github.com/Humengxian/GKONet.

Full Text