3D Human Pose Estimation via Spatio-Temporal Matching from Monocular RGB Images

Jielu Yan,Bin Fang,Ke Xu,Ming Liang Zhou

doi:10.1142/s0218001422550175

Abstract

Three-dimensional (3D) human pose estimation aims to locate 3D keypoints of individuals from given input RGB images. For two-dimensional (2D) human pose estimation problems, majority methods inferring 2D poses are from 2D heatmaps. However, it is hard to extend this method to 3D poses inferring area which makes computational loads increase sharply. To address the above problem, we propose STM-CNN method to estimate reconstruction coefficient matrix to calculate the final 3D pose instead of estimating 3D heatmaps to decrease the computational loads. First, STM-CNN does a preprocessing procedure to calculate a set of shape and weight bases. Second, STM-CNN infers a 2D matrix called reconstruction coefficient from the STM-CNN architecture. Third, STM-CNN utilizes the preprocessing shape and weight bases and estimated reconstruction coefficient matrix to calculate the final 3D pose. Meanwhile, STM-CNN method achieves better performances compared with the state-of-the-art methods on Human3.6M.

Full Text