Weakly-supervised pre-training for 3D human pose estimation via perspective knowledge

Zhongwei Qiu,Kai Qiu,Jianlong Fu,Dongmei Fu

doi:10.1016/j.patcog.2023.109497

Zhongwei Qiu, Kai Qiu + Show 2 more

Open Access

https://doi.org/10.1016/j.patcog.2023.109497

Copy DOI

Abstract

Modern deep learning-based 3D pose estimation approaches require plenty of 3D pose annotations. However, existing 3D datasets lack diversity, which limits the performance of current methods and their generalization ability. Although existing methods utilize 2D pose annotations to help 3D pose estimation, they mainly focus on extracting 2D structural constraints from 2D poses, ignoring the 3D information hidden in the images. In this paper, we propose a novel method to extract weak 3D information directly from 2D images without 3D pose supervision. Firstly, we utilize 2D pose annotations and perspective prior knowledge to generate the relative depth of human joints. Then, we collect a 2D pose dataset (MCPC) and generate relative depth labels. Based on MCPC, we propose a weakly-supervised pre-training (WSP) strategy to distinguish the depth relationship between two points in an image. WSP enables the learning of the relative depth of two keypoints on lots of in-the-wild images, which is more capable of predicting depth and generalization ability for 3D human pose estimation. After fine-tuning the pose model on 3D pose datasets, WSP achieves state-of-the-art results on two widely-used benchmarks.

Full Text