Limb Pose Aware Networks for Monocular 3D Pose Estimation.

Lele Wu,Zhenbo Yu,Qingshan Liu,Yijiang Liu

doi:10.1109/tip.2021.3136613

Abstract

In the task of monocular 3D pose estimation, the estimation errors of limb joints (i.e., wrist, ankle, etc) with a higher degree of freedom(DOF) are larger than that of others (i.e., hip, thorax, etc). Specifically, errors may accumulate along the physiological structure of human body parts, and trajectories of joints with higher DOF bring in higher complexity. To address this problem, we propose a limb pose aware framework, involving a kinematic constraint aware network as well as a trajectory aware temporal module, to improve the 3D prediction accuracy of limb joint positions. Two kinematic constraints named relative bone angles and absolute bone angles are introduced in this paper, the former being used for building the angular relation between adjacent bones and the latter for building the angular relation between bones and the camera plane. As a joint result of two constraints, our work suppresses errors accumulated along limbs. Furthermore, we propose a trajectory-aware network, named as Hierarchical Transformer, which takes temporal trajectories of joints as input and generates fused trajectory estimation as a result. The Hierarchical Transformer consists of Transformer Encoder blocks and aims at improving the performance of fusing temporal features. Under the effect of kinematic constraints and trajectory network, we alleviate the problem of errors accumulated along limbs and achieve promising results. Most of the off-the-shelf 2D pose estimators can be easily integrated into our framework. We perform extensive experiments on public datasets and validate the effectiveness of the framework. The ablation studies show the strength of each individual sub-module.

Full Text