Multi-person 3D pose estimation from a single image captured by a fisheye camera

Yahui Zhang,Shaodi You,Sezer Karaoglu,Theo Gevers

doi:10.1016/j.cviu.2022.103505

Yahui Zhang, Shaodi You + Show 2 more

Open Access

https://doi.org/10.1016/j.cviu.2022.103505

Copy DOI

Abstract

Multi-person 3D pose estimation with absolute depths for a fisheye camera is a challenging task but with valuable applications in daily life, especially for video surveillance. However, to the best of our knowledge, such problem has not been explored so far, leaving a gap in practical applications. In this work, we first propose a method for multi-person 3D pose estimation from a single image taken by a fisheye camera. Our method consists of two branches to estimate absolute 3D human poses: (1) a 2D-to-3D lifting module to predict root-relative 3D human poses (HPoseNet); (2) a root regression module to estimate absolute root locations in the camera coordinate (HRootNet). Finally, we propose a fisheye re-projection module without using ground-truth camera parameters to connect two branches, alleviating the impact of image distortions on 3D pose estimation and further regularizing prediction absolute 3D poses. Experimental results demonstrate that our method achieves the state-of-the-art performance on two public multi-person 3D pose datasets with synthetic fisheye images and our newly collected dataset with real fisheye images. The code and new dataset will be made publicly available.

Full Text